Text statistics

  1. Word_frequency
  2. Heaps'_Law
  3. Zipf's_law
  4. Project_topics
  5. Resources

Zipf's law is fundamental in NLP and IR. It is the guideline for writing efficient code in IR. In lots of algorithms for example in word2vec, we see that stop words and rare words need to be removed. To understand the reason behind, we need to understand Zipf's law first.

Word_frequency

We can count word frequency efficiently using Counter class. Then we sort the counter by frequency.

In [5]:
from collections import Counter
import re
voc = Counter()
with open("../data/vldb.txt") as f: #  reuters2.txt") as f:
    for line in f:
        tokens=re.findall('[a-zA-Z]+', line.lower())
        voc.update(Counter(tokens))
print(voc.most_common(10))
[('for', 1343), ('a', 1078), ('data', 1070), ('of', 1010), ('in', 984), ('and', 945), ('the', 750), ('database', 517), ('on', 455), ('databases', 397)]
In [6]:
voc_sorted=dict(sorted(voc.items(), key=lambda item: item[1], reverse=True))
print(voc_sorted)
{'for': 1343, 'a': 1078, 'data': 1070, 'of': 1010, 'in': 984, 'and': 945, 'the': 750, 'database': 517, 'on': 455, 'databases': 397, 'query': 347, 'an': 323, 'to': 323, 'queries': 309, 'with': 293, 'system': 280, 'based': 279, 'systems': 250, 'processing': 240, 'efficient': 229, 'management': 218, 'large': 201, 'xml': 200, 'using': 200, 'relational': 189, 'web': 171, 'distributed': 170, 'search': 159, 'approach': 143, 'information': 142, 'model': 131, 'object': 127, 'mining': 127, 'design': 118, 'base': 117, 'optimization': 117, 'performance': 113, 'time': 112, 'multi': 101, 'over': 101, 'analysis': 95, 'indexing': 95, 'from': 94, 'access': 93, 'spatial': 92, 'schema': 91, 'algorithms': 89, 'join': 87, 'memory': 85, 'parallel': 85, 'streams': 83, 'semantic': 83, 'scalable': 80, 'tree': 80, 'framework': 79, 'dynamic': 79, 'high': 78, 'applications': 78, 'storage': 77, 'language': 77, 'bases': 76, 'querying': 74, 'by': 74, 'similarity': 72, 'index': 72, 'oriented': 72, 'evaluation': 71, 'integration': 70, 'sql': 68, 'k': 68, 'temporal': 68, 'privacy': 67, 'new': 65, 'services': 65, 'control': 65, 'graph': 63, 'very': 63, 'support': 63, 'views': 63, 'matching': 62, 'transaction': 61, 'networks': 60, 'adaptive': 60, 'fast': 60, 'scale': 59, 'retrieval': 59, 'dimensional': 59, 'knowledge': 56, 'towards': 56, 'top': 55, 'stream': 54, 'application': 53, 'approximate': 52, 'architecture': 51, 'real': 51, 'objects': 50, 'joins': 50, 'probabilistic': 49, 's': 49, 'dbms': 48, 'implementation': 48, 'method': 48, 'algorithm': 48, 'techniques': 47, 'interactive': 45, 'p': 45, 'network': 45, 'multiple': 45, 'supporting': 45, 'trees': 45, 'concurrency': 45, 'clustering': 44, 'constraints': 44, 'user': 44, 'server': 43, 'db': 42, 'rules': 42, 'online': 41, 'managing': 41, 'detection': 41, 'peer': 41, 'text': 41, 'pattern': 40, 'execution': 40, 'structure': 39, 'keyword': 39, 'models': 39, 'updates': 38, 'through': 38, 'view': 38, 'continuous': 37, 'graphs': 37, 'non': 37, 'estimation': 37, 'modeling': 36, 'active': 36, 'integrating': 36, 'documents': 36, 'uncertain': 35, 'heterogeneous': 35, 'oracle': 35, 'complex': 35, 'computing': 34, 'incremental': 34, 'cost': 34, 'distance': 34, 'relations': 34, 'issues': 34, 'transactions': 34, 'structures': 34, 'discovery': 33, 'structured': 33, 'sources': 33, 'methods': 33, 'abstract': 33, 'streaming': 32, 'optimizing': 32, 'integrity': 32, 'cache': 32, 'shared': 32, 'multidimensional': 32, 'technology': 32, 'computation': 31, 'file': 31, 'preserving': 31, 'environment': 31, 'main': 30, 'big': 30, 'aware': 30, 'update': 30, 'analytics': 30, 'semantics': 30, 'conference': 30, 'social': 29, 'nearest': 29, 'e': 29, 'functional': 29, 'dependencies': 29, 'recovery': 29, 'optimal': 29, 'maintenance': 29, 'r': 29, 'programming': 29, 'answering': 29, 'automatic': 29, 'proceedings': 29, 'platform': 28, 'set': 28, 'research': 28, 'engine': 28, 'caching': 28, 'hash': 28, 'panel': 28, 'entity': 27, 'consistency': 27, 'extended': 27, 'hierarchical': 27, 'mapreduce': 26, 'locking': 26, 'path': 26, 'space': 26, 'scheduling': 26, 'as': 26, 'interface': 26, 'sharing': 26, 'study': 26, 'content': 26, 'international': 26, 'nested': 26, 'moving': 25, 'languages': 25, 'association': 25, 'extraction': 25, 'secure': 25, 'logical': 25, 'event': 25, 'driven': 25, 'theory': 25, 'practical': 25, 'tool': 25, 'software': 25, 'buffer': 25, 'deductive': 25, 'technologies': 24, 'partitioning': 24, 'business': 24, 'frequent': 24, 'aggregation': 24, 'monitoring': 24, 'scientific': 24, 'selection': 24, 'strategies': 24, 'generalized': 23, 'warehousing': 23, 'learning': 23, 'rule': 23, 'internet': 23, 'series': 23, 'image': 23, 'b': 23, 'skyline': 23, 'world': 23, 'xquery': 23, 'neighbor': 22, 'under': 22, 'integrated': 22, 'range': 22, 'quality': 22, 'its': 22, 'sensor': 22, 'level': 22, 'sets': 22, 'environments': 22, 'effective': 22, 'mapping': 22, 'materialized': 22, 'service': 21, 'compressed': 21, 'evolution': 21, 'schemes': 21, 'optimizer': 21, 'olap': 21, 'is': 21, 'aggregate': 21, 'document': 21, 'tuning': 21, 'disk': 21, 'self': 21, 'vldb': 21, 'conceptual': 21, 'generation': 20, 'general': 20, 'sampling': 20, 'improving': 20, 'flexible': 20, 'results': 20, 'highly': 20, 'patterns': 20, 'specification': 20, 'logic': 20, 'multimedia': 20, 'exploration': 19, 'datasets': 19, 'one': 19, 'efficiently': 19, 'mobile': 19, 'massive': 19, 'challenges': 19, 'ranking': 19, 'warehouse': 19, 'selectivity': 19, 'what': 19, 'engineering': 19, 'manager': 19, 'translation': 19, 'special': 19, 'issue': 19, 'structural': 18, 'building': 18, 'declarative': 18, 'exploiting': 18, 'searching': 18, 'answers': 18, 'statistical': 18, 'security': 18, 'physical': 18, 'problem': 18, 'strategy': 18, 'load': 18, 'extensible': 18, 'processor': 18, 'how': 17, 'hashing': 17, 'enabling': 17, 'string': 17, 'spatio': 17, 'compression': 17, 'workflow': 17, 'resolution': 17, 'robust': 17, 'two': 17, 'architectures': 17, 'into': 17, 'linear': 17, 'functions': 17, 'or': 17, 'generic': 17, 'client': 17, 'hybrid': 16, 'replication': 16, 'paper': 16, 'implementing': 16, 'via': 16, 'concurrent': 16, 'incomplete': 16, 'cloud': 16, 'global': 16, 'random': 16, 'rdf': 16, 'at': 16, 'aggregates': 16, 'warehouses': 16, 'form': 16, 'process': 16, 'practice': 16, 'context': 16, 'th': 16, 'grid': 16, 'case': 16, 'sites': 16, 'experimental': 15, 'benchmark': 15, 'commerce': 15, 'resource': 15, 'replicated': 15, 'spaces': 15, 'concept': 15, 'visual': 15, 'i': 15, 'o': 15, 'attribute': 15, 'technique': 15, 'development': 15, 'survey': 15, 'engines': 15, 'presence': 15, 'between': 15, 'classification': 15, 'manipulation': 15, 'toward': 15, 'universal': 15, 'domain': 15, 'unified': 15, 'value': 15, 'federated': 15, 'scheme': 15, 'extending': 15, 'tools': 15, 'interfaces': 15, 'algebraic': 15, 'line': 15, 'methodology': 15, 'operations': 15, 'analytical': 14, 'advanced': 14, 'up': 14, 'it': 14, 'window': 14, 'evaluating': 14, 'constraint': 14, 'exploring': 14, 'order': 14, 'checking': 14, 'automated': 14, 'road': 14, 'mappings': 14, 'problems': 14, 'local': 14, 'operator': 14, 'histograms': 14, 'cube': 14, 'machine': 14, 'external': 14, 'use': 14, 'files': 14, 'maintaining': 14, 'some': 14, 'transformation': 14, 'multiprocessor': 14, 'hierarchies': 14, 'impact': 14, 'workloads': 13, 'store': 13, 'location': 13, 'microsoft': 13, 'indexes': 13, 'record': 13, 'values': 13, 'hardware': 13, 'decision': 13, 'publishing': 13, 'type': 13, 'estimating': 13, 'size': 13, 'expressions': 13, 'provenance': 13, 'reverse': 13, 'clusters': 13, 'next': 13, 'directions': 13, 'xpath': 13, 'sequence': 13, 'tracking': 13, 'optimized': 13, 'tables': 13, 'centric': 13, 'novel': 13, 'change': 13, 'feedback': 13, 'low': 13, 'prediction': 13, 'processes': 13, 'program': 13, 'production': 13, 'types': 13, 'open': 12, 'making': 12, 'logging': 12, 'long': 12, 'allocation': 12, 'discovering': 12, 'complexity': 12, 'trajectory': 12, 'future': 12, 'way': 12, 'table': 12, 'comparison': 12, 'guarantees': 12, 'lazy': 12, 'scaling': 12, 'schemas': 12, 'column': 12, 'parallelism': 12, 'sequences': 12, 'result': 12, 'approximation': 12, 'that': 12, 'finding': 12, 'exact': 12, 'overview': 12, 'machines': 12, 'their': 12, 'algebra': 12, 'sort': 12, 'distribution': 12, 'formal': 12, 'persistent': 12, 'semistructured': 12, 'relationship': 12, 'computer': 12, 'modelling': 12, 'multidatabase': 12, 'are': 11, 'workload': 11, 'publish': 11, 'subscribe': 11, 'your': 11, 'personal': 11, 'servers': 11, 'twig': 11, 'placement': 11, 'ontology': 11, 'role': 11, 'state': 11, 'c': 11, 'dissemination': 11, 'error': 11, 'lightweight': 11, 'iterative': 11, 'ranked': 11, 'full': 11, 'simple': 11, 'partitioned': 11, 'page': 11, 'deep': 11, 'defined': 11, 'recursive': 11, 'independent': 11, 'fuzzy': 11, 'n': 11, 'end': 11, 'write': 11, 'expression': 11, 'session': 11, 'versions': 11, 'adaptable': 11, 'providing': 11, 'demonstration': 10, 'reliable': 10, 'cooperative': 10, 'hadoop': 10, 'dependency': 10, 'about': 10, 'processors': 10, 'rdbms': 10, 'independence': 10, 'cleaning': 10, 'metric': 10, 'relevant': 10, 'across': 10, 'progressive': 10, 'power': 10, 'vector': 10, 'free': 10, 'accurate': 10, 'knn': 10, 'best': 10, 'like': 10, 'not': 10, 'rfid': 10, 'efficiency': 10, 'sorting': 10, 'protection': 10, 'middleware': 10, 'plan': 10, 'filtering': 10, 'uncertainty': 10, 'detecting': 10, 'consistent': 10, 'metadata': 10, 'biological': 10, 'composite': 10, 'anonymity': 10, 'conscious': 10, 'collection': 10, 'balancing': 10, 'introduction': 10, 'invited': 10, 'complete': 10, 'predicate': 10, 'perspective': 10, 'organization': 10, 'cubes': 10, 'history': 10, 'log': 9, 'virtual': 9, 'constrained': 9, 'intelligent': 9, 'grained': 9, 'updating': 9, 'sensitive': 9, 'oltp': 9, 'exchange': 9, 'throughput': 9, 'ad': 9, 'hoc': 9, 'crowdsourcing': 9, 'inference': 9, 'community': 9, 'behavior': 9, 'summarization': 9, 'attributes': 9, 'enterprise': 9, 'cluster': 9, 'flash': 9, 'regular': 9, 'generating': 9, 'subgraph': 9, 'understanding': 9, 'arbitrary': 9, 'measures': 9, 'construction': 9, 'semi': 9, 'anonymization': 9, 'relationships': 9, 'representation': 9, 'programs': 9, 'snapshot': 9, 'source': 9, 'handling': 9, 'resident': 9, 'collaborative': 9, 'digital': 9, 'why': 9, 'volume': 9, 'intensive': 9, 'mechanism': 9, 'phase': 9, 'different': 9, 'transformations': 9, 'conversion': 9, 'protocols': 9, 'transitive': 9, 'experiences': 9, 'closure': 9, 'cad': 9, 'expert': 9, 'predicates': 9, 'partial': 9, 'communication': 9, 'private': 8, 'fine': 8, 'reducing': 8, 'preference': 8, 'collections': 8, 'duplicate': 8, 'linkage': 8, 'reasoning': 8, 'unstructured': 8, 'modern': 8, 'you': 8, 'approaches': 8, 'developing': 8, 'analyzing': 8, 'map': 8, 'relation': 8, 'infrastructure': 8, 'more': 8, 'node': 8, 'point': 8, 'wavelet': 8, 'differential': 8, 'class': 8, 'binary': 8, 'speed': 8, 'analytic': 8, 'stores': 8, 'enhanced': 8, 'auditing': 8, 'datalog': 8, 'definition': 8, 'media': 8, 'scans': 8, 'asynchronous': 8, 'fusion': 8, 'devices': 8, 'safe': 8, 'ordering': 8, 'without': 8, 'video': 8, 'dbmss': 8, 'cardinality': 8, 'testing': 8, 'pagerank': 8, 'correlation': 8, 'wireless': 8, 'improved': 8, 'g': 8, 'policies': 8, 'points': 8, 'papers': 8, 'availability': 8, 'tutorial': 8, 'computers': 8, 'validation': 8, 'natural': 8, 'version': 8, 'histogram': 8, 'loading': 8, 'electronic': 8, 'ir': 8, 'indices': 8, 'parametric': 8, 'normal': 8, 'statement': 8, 'statistics': 8, 'commercial': 8, 'codasyl': 8, 'heterogeneity': 8, 'expressive': 7, 'key': 7, 'diversity': 7, 'message': 7, 'core': 7, 'decomposition': 7, 'transactional': 7, 'sliding': 7, 'plans': 7, 'group': 7, 'recommendation': 7, 'when': 7, 'configuration': 7, 'output': 7, 'correlations': 7, 'composition': 7, 'paradigm': 7, 'advisor': 7, 'x': 7, 'users': 7, 'human': 7, 'early': 7, 'action': 7, 'encrypted': 7, 'weight': 7, 'experience': 7, 'comments': 7, 'spatiotemporal': 7, 'clustered': 7, 'wise': 7, 'read': 7, 'ontologies': 7, 'implications': 7, 'block': 7, 'profiling': 7, 'subspace': 7, 'personalized': 7, 'capturing': 7, 'interaction': 7, 'rewriting': 7, 'skew': 7, 'bounded': 7, 'conjunctive': 7, 'feature': 7, 'beyond': 7, 'link': 7, 'solution': 7, 't': 7, 'down': 7, 'd': 7, 'bayesian': 7, 'gene': 7, 'simulation': 7, 'do': 7, 'bioinformatics': 7, 'industrial': 7, 'grouping': 7, 'sequential': 7, 'wide': 7, 'policy': 7, 're': 7, 'reorganization': 7, 'solutions': 7, 'buffering': 7, 'report': 7, 'redundancy': 7, 'autonomous': 7, 'bulk': 7, 'outsourced': 7, 'evolving': 7, 'features': 7, 'derived': 7, 'properties': 7, 'imprecise': 7, 'trajectories': 6, 'predictive': 6, 'protocol': 6, 'itemset': 6, 'shortest': 6, 'lineage': 6, 'repository': 6, 'crowd': 6, 'delta': 6, 'sap': 6, 'geo': 6, 'near': 6, 'protein': 6, 'fly': 6, 'cross': 6, 'gpu': 6, 'etl': 6, 'nothing': 6, 'synopses': 6, 'edit': 6, 'disks': 6, 'workflows': 6, 'migration': 6, 'alignment': 6, 'opportunities': 6, 'tpc': 6, 'single': 6, 'energy': 6, 'challenge': 6, 'replacement': 6, 'intelligence': 6, 'taxonomy': 6, 'lists': 6, 'small': 6, 'pay': 6, 'match': 6, 'events': 6, 'grids': 6, 'preservation': 6, 'master': 6, 'prototype': 6, 'counting': 6, 'can': 6, 'sparse': 6, 'distinct': 6, 'modular': 6, 'concepts': 6, 'extracting': 6, 'vs': 6, 'bottom': 6, 'progress': 6, 'identification': 6, 'filters': 6, 'routing': 6, 'relevance': 6, 'crawling': 6, 'proximity': 6, 'among': 6, 'partially': 6, 'ordered': 6, 'organizing': 6, 'available': 6, 'ahead': 6, 'multiversion': 6, 'encryption': 6, 'during': 6, 'effects': 6, 'specifications': 6, 'demand': 6, 'geometric': 6, 'recommendations': 6, 'randomized': 6, 'bridging': 6, 'applying': 6, 'constructing': 6, 'capabilities': 6, 'example': 6, 'shedding': 6, 'dynamically': 6, 'correctness': 6, 'authorization': 6, 'freshness': 6, 'interval': 6, 'guest': 6, 'editorial': 6, 'subset': 6, 'dual': 6, 'oodb': 6, 'accessing': 6, 'automating': 6, 'abstraction': 6, 'office': 6, 'activity': 6, 'repositories': 6, 'extension': 6, 'authority': 6, 'purpose': 6, 'native': 6, 'graphical': 6, 'principles': 6, 'uniform': 6, 'prefetching': 5, 'architectural': 5, 'ontological': 5, 'weighted': 5, 'locally': 5, 'all': 5, 'elastic': 5, 'disclosure': 5, 'contention': 5, 'improve': 5, 'earth': 5, 'graphics': 5, 'reconciliation': 5, 'mixed': 5, 'vision': 5, 'neighbors': 5, 'resources': 5, 'comparative': 5, 'reduce': 5, 'work': 5, 'solving': 5, 'serial': 5, 'measuring': 5, 'enabled': 5, 'traffic': 5, 'array': 5, 'scan': 5, 'chip': 5, 'subsystem': 5, 'consolidation': 5, 'symmetric': 5, 'verification': 5, 'cpu': 5, 'easy': 5, 'elephants': 5, 'internal': 5, 'reachability': 5, 'storing': 5, 'reference': 5, 'truth': 5, 'explaining': 5, 'taming': 5, 'failures': 5, 'merging': 5, 'predicting': 5, 'projections': 5, 'optimize': 5, 'isolation': 5, 'revisited': 5, 'route': 5, 'regions': 5, 'deployment': 5, 'grams': 5, 'industry': 5, 'closed': 5, 'commit': 5, 'minimizing': 5, 'variable': 5, 'preferences': 5, 'related': 5, 'outliers': 5, 'characteristics': 5, 'interesting': 5, 'publication': 5, 'avoiding': 5, 'platforms': 5, 'versatile': 5, 'heterogenous': 5, 'run': 5, 'valued': 5, 'foundations': 5, 'family': 5, 'requirements': 5, 'historical': 5, 'healthcare': 5, 'rights': 5, 'off': 5, 'records': 5, 'changing': 5, 'qos': 5, 'selective': 5, 'area': 5, 'authentication': 5, 'wavelets': 5, 'kernel': 5, 'reduction': 5, 'visualization': 5, 'skewed': 5, 'failure': 5, 'costs': 5, 'cooperating': 5, 'interoperability': 5, 'embedded': 5, 'splitting': 5, 'dominating': 5, 'timestamp': 5, 'classifier': 5, 'facilities': 5, 'pass': 5, 'networked': 5, 'meta': 5, 'optimistic': 5, 'coherency': 5, 'function': 5, 'referential': 5, 'alternative': 5, 'acceleration': 5, 'postgres': 5, 'relative': 5, 'm': 5, 'description': 5, 'arrays': 5, 'pre': 5, 'crawler': 5, 'containment': 5, 'norms': 5, 'zero': 5, 'broadcast': 5, 'principle': 5, 'dictionary': 5, 'lp': 5, 'deferred': 5, 'assignment': 5, 'propagation': 5, 'test': 5, 'dimensionality': 5, 'composing': 5, 'deadlock': 5, 'life': 5, 'star': 5, 'biodiversity': 5, 'tertiary': 5, 'secondary': 5, 'three': 5, 'restructuring': 5, 'inclusion': 5, 'medical': 5, 'calibrating': 4, 'differentially': 4, 'subsequence': 4, 'summaries': 4, 'textual': 4, 'parameter': 4, 'limited': 4, 'dynamics': 4, 'tolerant': 4, 'creation': 4, 'forms': 4, 'recommender': 4, 'taxonomies': 4, 'bounds': 4, 'trust': 4, 'running': 4, 'jobs': 4, 'samples': 4, 'ibm': 4, 'expansion': 4, 'higher': 4, 'planning': 4, 'common': 4, 'questions': 4, 'comprehensive': 4, 'representations': 4, 'just': 4, 'scalability': 4, 'archiving': 4, 'itemsets': 4, 'only': 4, 'sketches': 4, 'comparing': 4, 'surface': 4, 'stochastic': 4, 'device': 4, 'precision': 4, 'f': 4, 'product': 4, 'logs': 4, 'fpgas': 4, 'spam': 4, 'operators': 4, 'dimensions': 4, 'modal': 4, 'depth': 4, 'frequency': 4, 'maximizing': 4, 'rapid': 4, 'compact': 4, 'groups': 4, 'runtime': 4, 'trends': 4, 'toolkit': 4, 'summarizing': 4, 'networking': 4, 'procedures': 4, 'coordination': 4, 'public': 4, 'answer': 4, 'fixes': 4, 'outlier': 4, 'suffix': 4, 'region': 4, 'identifying': 4, 'conflicts': 4, 'influence': 4, 'matrix': 4, 'hierarchy': 4, 'redundant': 4, 'entities': 4, 'enumeration': 4, 'massively': 4, 'encoding': 4, 'guarantee': 4, 'effect': 4, 'gap': 4, 'manage': 4, 'merge': 4, 'nn': 4, 'configurations': 4, 'signatures': 4, 'proof': 4, 'mass': 4, 'revisiting': 4, 'co': 4, 'style': 4, 'navigation': 4, 'workshop': 4, 'september': 4, 'desktop': 4, 'first': 4, 'anomalies': 4, 'bucket': 4, 'intrusion': 4, 'trade': 4, 'computational': 4, 'attacks': 4, 'art': 4, 'where': 4, 'smart': 4, 'editors': 4, 'substring': 4, 'maximum': 4, 'hot': 4, 'components': 4, 'vertical': 4, 'topss': 4, 'coverage': 4, 'be': 4, 'than': 4, 'nd': 4, 'experiments': 4, 'l': 4, 'reality': 4, 'bit': 4, 'declustering': 4, 'hypertext': 4, 'transposed': 4, 'de': 4, 'proxy': 4, 'minimization': 4, 'combining': 4, 'watermarking': 4, 'theoretical': 4, 'assessment': 4, 'bound': 4, 'project': 4, 'quantitative': 4, 'regression': 4, 'personalization': 4, 'prototyping': 4, 'science': 4, 'deterministic': 4, 'browsing': 4, 'operating': 4, 'directory': 4, 'metrics': 4, 'granularity': 4, 'shapes': 4, 'ubiquitous': 4, 'aspects': 4, 'specific': 4, 'v': 4, 'part': 4, 'ii': 4, 'boolean': 4, 'pragmatic': 4, 'component': 4, 'current': 4, 'images': 4, 'length': 4, 'dedicated': 4, 'designing': 4, 'powerful': 4, 'normalization': 4, 'preface': 4, 'basis': 4, 'null': 4, 'serializability': 4, 'calculus': 4, 'priority': 4, 'cell': 4, 'eliminating': 4, 'vlsi': 4, 'instance': 4, 'criteria': 4, 'direct': 4, 'hidden': 4, 'synopsis': 4, 'anytime': 4, 'response': 4, 'pc': 4, 'theoretic': 4, 'facility': 4, 'signature': 4, 'revolution': 4, 'primitives': 4, 'designs': 4, 'triggers': 4, 'unifying': 4, 'balanced': 4, 'year': 4, 'w': 4, 'overlay': 4, 'embedding': 4, 'catalog': 4, 'portals': 4, 'legacy': 4, 'inverse': 4, 'we': 4, 'integrator': 4, 'national': 4, 'minimal': 4, 'empirical': 4, 'starburst': 4, 'dealing': 4, 'locks': 4, 'reordering': 4, 'mediator': 4, 'sum': 4, 'wrapper': 4, 'equivalence': 4, 'operation': 4, 'warping': 4, 'does': 4, 'recursion': 4, 'customer': 3, 'executing': 3, 'predictable': 3, 'prefix': 3, 'them': 3, 'locality': 3, 'risk': 3, 'optimizers': 3, 'lock': 3, 'explore': 3, 'gis': 3, 'hard': 3, 'list': 3, 'polynomial': 3, 'timestamping': 3, 'achieving': 3, 'fault': 3, 'layer': 3, 'practices': 3, 'tuples': 3, 'twitter': 3, 'law': 3, 'game': 3, 'behavioral': 3, 'consumption': 3, 'estimate': 3, 'matters': 3, 'pushing': 3, 'solid': 3, 'past': 3, 'capacity': 3, 'conditions': 3, 'similarities': 3, 'commodity': 3, 'parallelizing': 3, 'assisted': 3, 'label': 3, 'build': 3, 'managers': 3, 'success': 3, 'inheritance': 3, 'evolutionary': 3, 'authenticated': 3, 'accelerating': 3, 'dataflow': 3, 'back': 3, 'expressiveness': 3, 'multicore': 3, 'scenarios': 3, 'thousand': 3, 'stars': 3, 'approximating': 3, 'stack': 3, 'qs': 3, 'annotation': 3, 'land': 3, 'basic': 3, 'everything': 3, 'genomic': 3, 'meet': 3, 'delivery': 3, 'numerical': 3, 'transforming': 3, 'organizations': 3, 'probing': 3, 'fourth': 3, 'task': 3, 'sense': 3, 'annotated': 3, 'ratio': 3, 'threshold': 3, 'counts': 3, 'micro': 3, 'prevention': 3, 'anonymizing': 3, 'pool': 3, 'correlated': 3, 'who': 3, 'algorithmic': 3, 'go': 3, 'factor': 3, 'conditional': 3, 'diagrams': 3, 'clouds': 3, 'outer': 3, 'distances': 3, 'navigating': 3, 'certain': 3, 'era': 3, 'health': 3, 'localized': 3, 'nosql': 3, 'latency': 3, 'dense': 3, 'termination': 3, 'maximization': 3, 'difference': 3, 'cpus': 3, 'aggregating': 3, 'mashup': 3, 'maps': 3, 'extensions': 3, 'levels': 3, 'surfing': 3, 'no': 3, 'leakage': 3, 'black': 3, 'topic': 3, 'look': 3, 'phrase': 3, 'library': 3, 'genetic': 3, 'microdata': 3, 'don': 3, 'scope': 3, 'examples': 3, 'mechanisms': 3, 'elimination': 3, 'evidence': 3, 'density': 3, 'filter': 3, 'representative': 3, 'piecewise': 3, 'frequently': 3, 'mitigating': 3, 'domains': 3, 'linking': 3, 'navigational': 3, 'coupled': 3, 'measurements': 3, 'decentralized': 3, 'putting': 3, 'identity': 3, 'vienna': 3, 'limitations': 3, 'second': 3, 'computations': 3, 'quasi': 3, 'id': 3, 'reservoir': 3, 'guessing': 3, 'microaggregation': 3, 'binding': 3, 'ordbms': 3, 'java': 3, 'minimum': 3, 'bottleneck': 3, 'out': 3, 'bandwidth': 3, 'enhancements': 3, 'expensive': 3, 'pair': 3, 'less': 3, 'selecting': 3, 'optimizations': 3, 'telecommunication': 3, 'tier': 3, 'memories': 3, 'refinement': 3, 'xwave': 3, 'characterization': 3, 'schemata': 3, 'dream': 3, 'hypothetical': 3, 'show': 3, 'subgraphs': 3, 'generalization': 3, 'associative': 3, 'utility': 3, 'inferring': 3, 'compliance': 3, 'guide': 3, 'grammars': 3, 'rdb': 3, 'countries': 3, 'autonomic': 3, 'rd': 3, 'curse': 3, 'loss': 3, 'benchmarking': 3, 'portal': 3, 'tradeoffs': 3, 'coarse': 3, 'granularities': 3, 'maximal': 3, 'controlling': 3, 'tracing': 3, 'partition': 3, 'assistant': 3, 'connected': 3, 'stored': 3, 'decade': 3, 'standards': 3, 'loop': 3, 'tradeoff': 3, 'cupid': 3, 'intervals': 3, 'sub': 3, 'u': 3, 'ds': 3, 'good': 3, 'completeness': 3, 'bookmark': 3, 'financial': 3, 'have': 3, 'everywhere': 3, 'promises': 3, 'bayes': 3, 'technical': 3, 'customizing': 3, 'status': 3, 'schemasql': 3, 'generator': 3, 'entropy': 3, 'scoring': 3, 'equijoin': 3, 'gram': 3, 'inverted': 3, 'chain': 3, 'libraries': 3, 'bitmap': 3, 'resilient': 3, 'administration': 3, 'bank': 3, 'delay': 3, 'z': 3, 'sky': 3, 'foundation': 3, 'intra': 3, 'need': 3, 'choosing': 3, 'commitment': 3, 'journal': 3, 'systematic': 3, 'expressed': 3, 'objectrank': 3, 'voronoi': 3, 'outsourcing': 3, 'coral': 3, 'cient': 3, 'disjunctive': 3, 'finite': 3, 'compliant': 3, 'applied': 3, 'restart': 3, 'combined': 3, 'steps': 3, 'tuple': 3, 'overhead': 3, 'sabre': 3, 'wavecluster': 3, 'mismatch': 3, 'versus': 3, 'synthetic': 3, 'genome': 3, 'extendible': 3, 'max': 3, 'quorum': 3, 'timber': 3, 'notification': 3, 'hippocratic': 3, 'oodbms': 3, 'procedural': 3, 'anatomy': 3, 'cubing': 3, 'crashes': 3, 'providers': 3, 'pages': 3, 'diagram': 3, 'ingres': 3, 'site': 3, 'heuristic': 3, 'informatics': 3, 'dimension': 3, 'markov': 3, 'light': 3, 'automatically': 3, 'projection': 3, 'accuracy': 3, 'enhancing': 3, 'monotonic': 3, 'acyclic': 3, 'iro': 3, 'fully': 3, 'removal': 3, 'extreme': 3, 'automata': 3, 'firm': 3, 'almost': 3, 'manipulating': 3, 'ansi': 3, 'keywords': 3, 'meets': 3, 'archives': 3, 'predictions': 3, 'pipelined': 3, 'creating': 3, 'browser': 3, 'combination': 3, 'garbage': 3, 'sundgren': 3, 'disseminating': 3, 'holistic': 3, 'gamma': 3, 'customizable': 3, 'aries': 3, 'workstation': 3, 'euclidean': 3, 'multiprocessors': 3, 'translating': 3, 'automation': 3, 'reuse': 3, 'changes': 3, 'addressable': 3, 'side': 3, 'categorical': 3, 'right': 3, 'q': 3, 'discrete': 3, 'searches': 3, 'utilization': 3, 'periodic': 3, 'codd': 3, 'encipherment': 3, 'trie': 3, 'clip': 3, 'interactions': 3, 'latent': 2, 'following': 2, 'quantification': 2, 'rose': 2, 'updatable': 2, 'evolvable': 2, 'transforms': 2, 'effectively': 2, 'tenant': 2, 'synthesis': 2, 'h': 2, 'performances': 2, 'mirror': 2, 'they': 2, 'gaps': 2, 'tolerances': 2, 'humming': 2, 'billion': 2, 'field': 2, 'done': 2, 'guided': 2, 'repair': 2, 'leveraging': 2, 'utilizing': 2, 'authenticating': 2, 'powered': 2, 'observatory': 2, 'compacting': 2, 'postgresql': 2, 'repairs': 2, 'violations': 2, 'intersection': 2, 'skip': 2, 'forecasting': 2, 'paras': 2, 'compiling': 2, 'strong': 2, 'fragments': 2, 'associations': 2, 'compilation': 2, 'influential': 2, 'upper': 2, 'lower': 2, 'prism': 2, 'trustworthiness': 2, 'slicing': 2, 'most': 2, 'monitor': 2, 'mad': 2, 'skills': 2, 'causality': 2, 'motif': 2, 'convoys': 2, 'growing': 2, 'apache': 2, 'repairing': 2, 'mt': 2, 'faster': 2, 'autocompletion': 2, 'simd': 2, 'ultra': 2, 'units': 2, 'ask': 2, 'hana': 2, 'hop': 2, 'tractable': 2, 'preprocessing': 2, 'eyes': 2, 'hosted': 2, 'continuously': 2, 'speculative': 2, 'awareness': 2, 'assignments': 2, 'relaxed': 2, 'exploratory': 2, 'mashups': 2, 'tabular': 2, 'mr': 2, 'net': 2, 'pig': 2, 'made': 2, 'hyper': 2, 'aggressive': 2, 'drives': 2, 'surrogate': 2, 'protected': 2, 'informative': 2, 'extracted': 2, 'spatialhadoop': 2, 'ssd': 2, 'secret': 2, 'dags': 2, 'inconsistent': 2, 'confidential': 2, 'has': 2, 'think': 2, 'vertex': 2, 'news': 2, 'broadband': 2, 'concise': 2, 'essential': 2, 'extract': 2, 'cracking': 2, 'white': 2, 'evolve': 2, 'missing': 2, 'isomorphism': 2, 'lattice': 2, 'rex': 2, 'pairs': 2, 'sparql': 2, 'static': 2, 'demo': 2, 'mixture': 2, 'alvisp': 2, 'today': 2, 'centers': 2, 'socially': 2, 'smarter': 2, 'planet': 2, 'haloop': 2, 'equivalent': 2, 'tasks': 2, 'bipartite': 2, 'groupings': 2, 'market': 2, 'topology': 2, 'eager': 2, 'guaranteeing': 2, 'modification': 2, 'xoring': 2, 'erasure': 2, 'codes': 2, 'predict': 2, 'periscope': 2, 'negation': 2, 'quest': 2, 'registration': 2, 'raw': 2, 'immutable': 2, 'revenue': 2, 'cloudia': 2, 'bursty': 2, 'tagged': 2, 'provisioning': 2, 'make': 2, 'dtw': 2, 'editing': 2, 'vertically': 2, 'strings': 2, 'spreadsheet': 2, 'another': 2, 'relaxation': 2, 'parameters': 2, 'connectivity': 2, 'handle': 2, 'lsh': 2, 'google': 2, 'tagging': 2, 'rich': 2, 'wrappers': 2, 'chase': 2, 'return': 2, 'mcc': 2, 'advertising': 2, 'gpus': 2, 'factorization': 2, 'years': 2, 'stop': 2, 'word': 2, 'viewing': 2, 'elasticity': 2, 'live': 2, 'going': 2, 'snp': 2, 'away': 2, 'trickles': 2, 'jim': 2, 'edge': 2, 'perturbation': 2, 'opening': 2, 'forward': 2, 'simrank': 2, 'spark': 2, 'ensembles': 2, 'bf': 2, 'deluge': 2, 'channel': 2, 'profiles': 2, 'nema': 2, 'getting': 2, 'hypergraphs': 2, 'items': 2, 'glav': 2, 'exactly': 2, 'usability': 2, 'empty': 2, 'story': 2, 'bloom': 2, 'pl': 2, 'synthesizing': 2, 'products': 2, 'learned': 2, 'genomes': 2, 'portable': 2, 'diversification': 2, 'quotient': 2, 'yellow': 2, 'cheetah': 2, 'reformulation': 2, 'rs': 2, 'direction': 2, 'simplification': 2, 'dbtoaster': 2, 'fresh': 2, 'friendly': 2, 'aether': 2, 'paris': 2, 'instances': 2, 'agnostic': 2, 'blocking': 2, 'convex': 2, 'hausdorff': 2, 'construct': 2, 'bisimulation': 2, 'existing': 2, 'mover': 2, 'terms': 2, 'telco': 2, 'better': 2, 'became': 2, 'profile': 2, 'austria': 2, 'named': 2, 'recognition': 2, 'st': 2, 'preventing': 2, 'invasive': 2, 'virtues': 2, 'joining': 2, 'semantically': 2, 'anonymous': 2, 'audit': 2, 'evaluate': 2, 'aida': 2, 'crowds': 2, 'factors': 2, 'recent': 2, 'volatile': 2, 'decompositions': 2, 'tamper': 2, 'background': 2, 'offs': 2, 'numeric': 2, 'drill': 2, 'nt': 2, 'buddy': 2, 'simulatable': 2, 'telecommunications': 2, 'malicious': 2, 'predeclared': 2, 'attack': 2, 'brokering': 2, 'templates': 2, 'wrap': 2, 'third': 2, 'rethinking': 2, 'ogsa': 2, 'articles': 2, 'exhibit': 2, 'defining': 2, 'conflict': 2, 'administrative': 2, 'searchable': 2, 'modularization': 2, 'xmark': 2, 'hashmap': 2, 'adapting': 2, 'anomalous': 2, 'requests': 2, 'should': 2, 'this': 2, 'rank': 2, 'overlapping': 2, 'well': 2, 'behaved': 2, 'cover': 2, 'satisfaction': 2, 'ee': 2, 'money': 2, 'vectors': 2, 'ldap': 2, 'michigan': 2, 'confidence': 2, 'seconds': 2, 'fact': 2, 'opportunistic': 2, 'least': 2, 'spider': 2, 'reputation': 2, 'ing': 2, 'generated': 2, 'resolving': 2, 'sequenced': 2, 'reading': 2, 'inputs': 2, 'layered': 2, 'feasibility': 2, 'tune': 2, 'optimisation': 2, 'sophisticate': 2, 'presentation': 2, 'derivation': 2, 'subspaces': 2, 'ode': 2, 'clues': 2, 'routes': 2, 'horn': 2, 'intensional': 2, 'informix': 2, 'versioning': 2, 'considerations': 2, 'synchronization': 2, 'inside': 2, 'heavy': 2, 'hitters': 2, 'paving': 2, 'xperanto': 2, 'assertions': 2, 'implement': 2, 'miro': 2, 'terabytes': 2, 'adding': 2, 'staircase': 2, 'similar': 2, 'dtds': 2, 'indeterminacy': 2, 'prisma': 2, 'tests': 2, 'mail': 2, 'korea': 2, 'agent': 2, 'reconstruction': 2, 'currency': 2, 'infrastructures': 2, 'simplify': 2, 'feasible': 2, 'home': 2, 'tries': 2, 'super': 2, 'serv': 2, 'negotiation': 2, 'gral': 2, 'generalizing': 2, 'which': 2, 'mdl': 2, 'physics': 2, 'na': 2, 've': 2, 'remarks': 2, 'et': 2, 'rasdaman': 2, 'weave': 2, 'ob': 2, 'pointer': 2, 'swizzling': 2, 'operational': 2, 'loosely': 2, 'grace': 2, 'tolerance': 2, 'complementary': 2, 'improvements': 2, 'cassm': 2, 'cellular': 2, 'four': 2, 'supervised': 2, 'factoring': 2, 'aqualogic': 2, 'used': 2, 'website': 2, 'awesome': 2, 'item': 2, 'specified': 2, 'trustworthy': 2, 'neural': 2, 'materialization': 2, 'vldbs': 2, 'teradata': 2, 'rankmass': 2, 'per': 2, 'census': 2, 'hamming': 2, 'determining': 2, 'hands': 2, 'refresh': 2, 'chief': 2, 'recovering': 2, 'configurable': 2, 'aided': 2, 'micronet': 2, 'locating': 2, 'fill': 2, 'modified': 2, 'hb': 2, 'standardization': 2, 'consistently': 2, 'summarize': 2, 'quantiles': 2, 'peter': 2, 'su': 2, 'pruning': 2, 'hit': 2, 'probability': 2, 'normalized': 2, 'capability': 2, 'xsearch': 2, 'ended': 2, 'studies': 2, 'constructs': 2, 'dbtg': 2, 'cardinalities': 2, 'selections': 2, 'oid': 2, 'runs': 2, 'regulatory': 2, 'retention': 2, 'effectiveness': 2, 'priorities': 2, 'against': 2, 'svp': 2, 'shooting': 2, 'archival': 2, 'microprocessor': 2, 'diam': 2, 'foral': 2, 'sybase': 2, 'objective': 2, 'acquisition': 2, 'auto': 2, 'bibliographic': 2, 'informatwn': 2, 'bucketization': 2, 'redistribution': 2, 'closeness': 2, 'needs': 2, 'multiview': 2, 'incorporating': 2, 'autonomy': 2, 'closest': 2, 'deploying': 2, 'experiment': 2, 'mutual': 2, 'hierarchically': 2, 'quist': 2, 'rewrite': 2, 'accelerator': 2, 'recoverable': 2, 'preliminary': 2, 'insurance': 2, 'communications': 2, 'costly': 2, 'multivalued': 2, 'relaxing': 2, 'call': 2, 'requirement': 2, 'bushy': 2, 'iceberg': 2, 'strudel': 2, 'redo': 2, 'after': 2, 'optical': 2, 'taking': 2, 'discriminant': 2, 'molecular': 2, 'incrementally': 2, 'paths': 2, 'cognizant': 2, 'sample': 2, 'mergesort': 2, 'unnesting': 2, 'fractal': 2, 'activities': 2, 'histories': 2, 'picodbms': 2, 'eos': 2, 'striping': 2, 'funding': 2, 'vague': 2, 'nf': 2, 'instant': 2, 'relating': 2, 'measure': 2, 'neighborhood': 2, 'griddb': 2, 'pervasive': 2, 'situation': 2, 'estimators': 2, 'duplicates': 2, 'separability': 2, 'min': 2, 'aggregations': 2, 'adaptation': 2, 'sensing': 2, 'transport': 2, 'extensibility': 2, 'nests': 2, 'contain': 2, 'subqueries': 2, 'pq': 2, 'pipeline': 2, 'bea': 2, 'potter': 2, 'wheel': 2, 'adaptivity': 2, 'dna': 2, 'quantifiable': 2, 'add': 2, 'having': 2, 'communities': 2, 'rox': 2, 'cleansing': 2, 'deriving': 2, 'speeding': 2, 'adapt': 2, 'qstream': 2, 'describing': 2, 'robustness': 2, 'untrusted': 2, 'useful': 2, 'delayed': 2, 'exceptions': 2, 'codb': 2, 'representing': 2, 'preparatory': 2, 'quadtree': 2, 'distributions': 2, 'definitions': 2, 'hospital': 2, 'coalesced': 2, 'skylines': 2, 'typicality': 2, 'choice': 2, 'lham': 2, 'bibsonomy': 2, 'prone': 2, 'nonlinear': 2, 'fixed': 2, 'magic': 2, 'hp': 2, 'any': 2, 'datacomputer': 2, 'interpolated': 2, 'qualitative': 2, 'satisfying': 2, 'multidatabases': 2, 'fundamental': 2, 'note': 2, 'specifying': 2, 'pictures': 2, 'kriegel': 2, 'fractured': 2, 'mirrors': 2, 'around': 2, 'identifiers': 2, 'elicitation': 2, 'opened': 2, 'correct': 2, 'labeling': 2, 'workstations': 2, 'graffiti': 2, 'oql': 2, 'acm': 2, 'latching': 2, 'topx': 2, 'matrices': 2, 'j': 2, 'messaging': 2, 'oodbs': 2, 'algebras': 2, 'prefetch': 2, 'sorted': 2, 'aurora': 2, 'false': 2, 'negative': 2, 'lixto': 2, 'backend': 2, 'select': 2, 'intranet': 2, 'bo': 2, 'edition': 2, 'get': 2, 'biomedical': 2, 'mil': 2, 'lingering': 2, 'number': 2, 'schematic': 2, 'observation': 2, 'enforcing': 2, 'exodus': 2, 'within': 2, 'sparc': 2, 'obtaining': 2, 'given': 2, 'teams': 2, 'repeating': 2, 'flows': 2, 'assumption': 2, 'india': 2, 'osam': 2, 'sleepers': 2, 'workaholics': 2, 'recency': 2, 'aim': 2, 'fetches': 2, 'motion': 2, 'english': 2, 'meaning': 2, 'rate': 2, 'multimodal': 2, 'name': 2, 'mems': 2, 'picture': 2, 'tv': 2, 'overlays': 2, 'morphology': 2, 'staging': 2, 'indexed': 2, 'he': 2, 'sirius': 2, 'french': 2, 'center': 2, 'segmented': 2, 'sciences': 2, 'behaviour': 2, 'double': 2, 'mobility': 2, 'property': 2, 'inconsistency': 2, 'windowed': 2, 'transparent': 2, 'remote': 2, 'cyclic': 2, 'interpreting': 2, 'along': 2, 'avoidance': 2, 'xps': 2, 'dal': 2, 'contrast': 2, 'plots': 2, 'sphere': 2, 'adversarial': 2, 'modules': 2, 'destination': 2, 'marriage': 2, 'government': 2, 'diverse': 2, 'ldl': 2, 'topological': 2, 'lifecycle': 2, 'expansions': 2, 'satisfiability': 2, 'focused': 2, 'federations': 2, 'sketching': 2, 'bitmaps': 2, 'faloutsos': 2, 'hilbert': 2, 'fractals': 2, 'vehicle': 2, 'boyce': 2, 'addressing': 2, 'split': 2, 'structuring': 2, 'coexistence': 2, 'foreword': 2, 'chairmen': 2, 'pathfinder': 2, 'multirelations': 2, 'interbase': 2, 'share': 2, 'now': 2, 'bitemporal': 2, 'reports': 2, 'orders': 2, 'backed': 2, 'meaningful': 2, 'broadening': 2, 'secured': 2, 'mediation': 2, 'equations': 2, 'rolap': 2, 'enforcement': 2, 'care': 1, 'trentorise': 1, 'bigdata': 1, 'trend': 1, 'aa': 1, 'asymmetric': 1, 'scout': 1, 'seda': 1, 'hadoopdb': 1, 'sensitivity': 1, 'woo': 1, 'unpredictable': 1, 'lahar': 1, 'markovian': 1, 'myriad': 1, 'numbering': 1, 'dwarfs': 1, 'rearview': 1, 'really': 1, 'lion': 1, 'ring': 1, 'clustera': 1, 'hum': 1, 'song': 1, 'ranges': 1, 'unbundled': 1, 'tweets': 1, 'our': 1, 'grow': 1, 'blink': 1, 'drillbeyond': 1, 'analysts': 1, 'iflow': 1, 'detouring': 1, 'confidentiality': 1, 'picasso': 1, 'visualizer': 1, 'mocgraph': 1, 'youtopia': 1, 'teleios': 1, 'crowdsourced': 1, 'paqo': 1, 'geospatial': 1, 'streaminsight': 1, 'yading': 1, 'intractable': 1, 'uniqueness': 1, 'erroneous': 1, 'cologne': 1, 'luw': 1, 'meshes': 1, 'executions': 1, 'ocr': 1, 'infopuzzle': 1, 'testbed': 1, 'challenging': 1, 'tail': 1, 'rethink': 1, 'fundamentals': 1, 'probable': 1, 'angie': 1, 'spilling': 1, 'dbwipes': 1, 'clean': 1, 'deco': 1, 'competitive': 1, 'loose': 1, 'respecting': 1, 'revival': 1, 'traces': 1, 'incompleteness': 1, 'supercharging': 1, 'purchase': 1, 'flashing': 1, 'fever': 1, 'cohadoop': 1, 'exploitation': 1, 'xsds': 1, 'bonxai': 1, 'strength': 1, 'quick': 1, 'mastro': 1, 'studio': 1, 'gs': 1, 'tms': 1, 'threat': 1, 'unicorn': 1, 'responsibility': 1, 'packagebuilder': 1, 'packages': 1, 'oracles': 1, 'truss': 1, 'flipping': 1, 'centroid': 1, 'partitional': 1, 'redoop': 1, 'recurring': 1, 'recall': 1, 'formation': 1, 'piranha': 1, 'short': 1, 'cep': 1, 'targeting': 1, 'embellishing': 1, 'protect': 1, 'restore': 1, 'reusing': 1, 'ufo': 1, 'husky': 1, 'handful': 1, 'seeds': 1, 'turbo': 1, 'charging': 1, 'convergence': 1, 'dbo': 1, 'daq': 1, 'boundaries': 1, 'kafka': 1, 'cloudvista': 1, 'economical': 1, 'moir': 1, 'rates': 1, 'soda': 1, 'timetravel': 1, 'jetscope': 1, 'tramp': 1, 'propagating': 1, 'theta': 1, 'ips': 1, 'package': 1, 'trip': 1, 'datagarage': 1, 'xtreenet': 1, 'democratic': 1, 'rknn': 1, 'mcdb': 1, 'mpp': 1, 'create': 1, 'adaptively': 1, 'okay': 1, 'llama': 1, 'air': 1, 'confucius': 1, 'disciples': 1, 'katara': 1, 'doubling': 1, 'transform': 1, 'privbasis': 1, 'pnuts': 1, 'yahoo': 1, 'serving': 1, 'deepdive': 1, 'mesa': 1, 'piql': 1, 'wysiwyg': 1, 'roadtrack': 1, 'clients': 1, 'coordinated': 1, 'swarm': 1, 'scolopax': 1, 'facilitate': 1, 'capri': 1, 'physicochemical': 1, 'aqwa': 1, 'fli': 1, 'mrshare': 1, 'odyssey': 1, 'multistore': 1, 'replay': 1, 'restrictions': 1, 'crawlers': 1, 'pathology': 1, 'rinse': 1, 'ads': 1, 'places': 1, 'circle': 1, 'evita': 1, 'raced': 1, 'metacompilation': 1, 'parenthood': 1, 'roxxi': 1, 'reviving': 1, 'witness': 1, 'horton': 1, 'eagletree': 1, 'swors': 1, 'shareddb': 1, 'killing': 1, 'stone': 1, 'age': 1, 'integer': 1, 'rasp': 1, 'price': 1, 'race': 1, 'repeats': 1, 'destabilizers': 1, 'composable': 1, 'unaggregated': 1, 'oceanst': 1, 'whole': 1, 'spicy': 1, 'insights': 1, 'snippet': 1, 'plp': 1, 'latch': 1, 'sapper': 1, 'zinc': 1, 'brighthouse': 1, 'accordion': 1, 'xplus': 1, 'inextcube': 1, 'explanation': 1, 'swans': 1, 'piece': 1, 'copying': 1, 'diversified': 1, 'scales': 1, 'cliques': 1, 'monetdb': 1, 'datacell': 1, 'querymarket': 1, 'pricing': 1, 'markets': 1, 'helpdesk': 1, 'spjua': 1, 'plasma': 1, 'hd': 1, 'makeup': 1, 'labels': 1, 'denial': 1, 'keeping': 1, 'socialite': 1, 'webtables': 1, 'gsketch': 1, 'spaceship': 1, 'workers': 1, 'chimera': 1, 'retrieving': 1, 'prestige': 1, 'agreementmaker': 1, 'bsma': 1, 'numa': 1, 'skysuite': 1, 'attractive': 1, 'repulsive': 1, 'permuting': 1, 'endoscope': 1, 'acquisitional': 1, 'cracked': 1, 'merged': 1, 'syntactically': 1, 'saasfee': 1, 'dremel': 1, 'dem': 1, 'drosophila': 1, 'embryo': 1, 'skipping': 1, 'box': 1, 'crowder': 1, 'whom': 1, 'jury': 1, 'blog': 1, 'bichromatic': 1, 'tolkien': 1, 'storytelling': 1, 'stampede': 1, 'yin': 1, 'yang': 1, 'optimum': 1, 'hamster': 1, 'clicklogs': 1, 'visualizations': 1, 'pangea': 1, 'triplebit': 1, 'augmentation': 1, 'schism': 1, 'geoscope': 1, 'differentiation': 1, 'trajstore': 1, 'iroad': 1, 'reach': 1, 'homomorphism': 1, 'gq': 1, 'guarded': 1, 'potential': 1, 'anti': 1, 'reconfigurable': 1, 'trading': 1, 'supergraph': 1, 'nodb': 1, 'studying': 1, 'methodologies': 1, 'mesh': 1, 'phones': 1, 'mcmc': 1, 'batched': 1, 'bindings': 1, 'means': 1, 'graphlab': 1, 'semandaq': 1, 'bundling': 1, 'xsact': 1, 'simulations': 1, 'clear': 1, 'viral': 1, 'automaton': 1, 'wire': 1, 'beginning': 1, 'sequel': 1, 'beautiful': 1, 'friendship': 1, 'xpedia': 1, 'stmaker': 1, 'islands': 1, 'disassociation': 1, 'sdtw': 1, 'salient': 1, 'alignments': 1, 'induced': 1, 'aster': 1, 'bringing': 1, 'mashed': 1, 'renewed': 1, 'bearing': 1, 'silification': 1, 'scuba': 1, 'diving': 1, 'facebook': 1, 'piggybacking': 1, 'senbazuru': 1, 'mosquito': 1, 'bites': 1, 'upload': 1, 'eventweet': 1, 'triangles': 1, 'ituned': 1, 'gconnect': 1, 'bregman': 1, 'divergence': 1, 'onslaught': 1, 'traclass': 1, 'skycubes': 1, 'thread': 1, 'cooperation': 1, 'sk': 1, 'datacenter': 1, 'timetrails': 1, 'crawl': 1, 'stratification': 1, 'basics': 1, 'turn': 1, 'quantcast': 1, 'nominal': 1, 'multiplication': 1, 'dependable': 1, 'forecasts': 1, 'vertica': 1, 'later': 1, 'revising': 1, 'matchability': 1, 'pdiffview': 1, 'albatross': 1, 'moderate': 1, 'conditioning': 1, 'expectations': 1, 'uasmas': 1, 'instantaneously': 1, 'snps': 1, 'aid': 1, 'scorpion': 1, 'translations': 1, 'wetsuit': 1, 'fusing': 1, 'attributed': 1, 'soft': 1, 'visqi': 1, 'bufferpool': 1, 'cloudy': 1, 'feed': 1, 'apps': 1, 'apis': 1, 'boxes': 1, 'flow': 1, 'looking': 1, 'embeddability': 1, 'walk': 1, 'click': 1, 'bonding': 1, 'betweenness': 1, 'dates': 1, 'url': 1, 'bidding': 1, 'ed': 1, 'cartography': 1, 'explorers': 1, 'clash': 1, 'titans': 1, 'madlib': 1, 'rankie': 1, 'automorphism': 1, 'terec': 1, 'tweet': 1, 'wires': 1, 'compiler': 1, 'securefiles': 1, 'prepared': 1, 'hive': 1, 'cdas': 1, 'tighter': 1, 'thrash': 1, 'unique': 1, 'counter': 1, 'strike': 1, 'rationing': 1, 'rider': 1, 'enhance': 1, 'ride': 1, 'alae': 1, 'affine': 1, 'biosequence': 1, 'recommenders': 1, 'urls': 1, 'logicblox': 1, 'metalogiql': 1, 'collective': 1, 'shifting': 1, 'too': 1, 'defunctionalization': 1, 'finch': 1, 'propolis': 1, 'provisioned': 1, 'catalogs': 1, 'dblp': 1, 'lessons': 1, 'efq': 1, 'polynomials': 1, 'regret': 1, 'cold': 1, 'trips': 1, 'siberia': 1, 'reef': 1, 'retainable': 1, 'evaluator': 1, 'multicores': 1, 'rcsi': 1, 'icbs': 1, 'slas': 1, 'dobjects': 1, 'metacomputing': 1, 'wrong': 1, 'equality': 1, 'cophy': 1, 'hype': 1, 'coprocessing': 1, 'imaging': 1, 'generates': 1, 'vinery': 1, 'ide': 1, 'easyticket': 1, 'ticket': 1, 'promotion': 1, 'elephant': 1, 'even': 1, 'noticing': 1, 'telescope': 1, 'alternatives': 1, 'secondo': 1, 'anticipatory': 1, 'nadeef': 1, 'ajaxsearch': 1, 'stability': 1, 'selftuning': 1, 'tiamat': 1, 'sxpath': 1, 'leewave': 1, 'coefficients': 1, 'ssds': 1, 'stubby': 1, 'homogeneous': 1, 'skewtune': 1, 'tableaux': 1, 'artemis': 1, 'statistically': 1, 'significant': 1, 'substrings': 1, 'chi': 1, 'square': 1, 'statistic': 1, 'calculation': 1, 'longest': 1, 'lasting': 1, 'economic': 1, 'mcsam': 1, 'expanding': 1, 'conflicting': 1, 'epps': 1, 'plus': 1, 'pacts': 1, 'nephele': 1, 'worker': 1, 'skill': 1, 'team': 1, 'writer': 1, 'burstiness': 1, 'territorial': 1, 'inzeit': 1, 'insightful': 1, 'mammals': 1, 'flourished': 1, 'before': 1, 'dinosaurs': 1, 'extinct': 1, 'observing': 1, 'variance': 1, 'boosting': 1, 'velocity': 1, 'nomad': 1, 'completion': 1, 'lipstick': 1, 'aide': 1, 'injecting': 1, 'obfuscation': 1, 'moist': 1, 'indexer': 1, 'school': 1, 'igraph': 1, 'comparisons': 1, 'possible': 1, 'azure': 1, 'documentdb': 1, 'release': 1, 'tp': 1, 'close': 1, 'swdb': 1, 'odbis': 1, 'revised': 1, 'selected': 1, 'architektur': 1, 'von': 1, 'informationssystemen': 1, 'reputations': 1, 'xbench': 1, 'benchmarks': 1, 'sema': 1, 'corpora': 1, 'datagridflows': 1, 'datagrids': 1, 'sax': 1, 'pupose': 1, 'methodapplied': 1, 'intrusions': 1, 'so': 1, 'solved': 1, 'facilitation': 1, 'geographically': 1, 'separated': 1, 'causes': 1, 'cassandra': 1, 'impossibility': 1, 'possibility': 1, 'annotations': 1, 'graphgen': 1, 'datahub': 1, 'thirtieth': 1, 'addict': 1, 'instruction': 1, 'chasing': 1, 'actively': 1, 'soliciting': 1, 'joi': 1, 'summingbird': 1, 'batch': 1, 'servicing': 1, 'seismic': 1, 'oil': 1, 'disambiguation': 1, 'adjusting': 1, 'pir': 1, 'drones': 1, 'contribution': 1, 'dl': 1, 'debate': 1, 'rewind': 1, 'sqlite': 1, 'maat': 1, 'sixteenth': 1, 'know': 1, 'been': 1, 'evident': 1, 'microrna': 1, 'precursors': 1, 'hummer': 1, 'achievable': 1, 'imprecision': 1, 'experimenting': 1, 'cdr': 1, 'trusted': 1, 'radix': 1, 'decluster': 1, 'dax': 1, 'widely': 1, 'multitenant': 1, 'hosting': 1, 'yima': 1, 'residential': 1, 'cerfix': 1, 'sovereign': 1, 'partners': 1, 'midstream': 1, 'sbml': 1, 'pathway': 1, 'decapitation': 1, 'economics': 1, 'iterated': 1, 'defense': 1, 'qudas': 1, 'hyrise': 1, 'standby': 1, 'annotators': 1, 'uls': 1, 'party': 1, 'javascript': 1, 'sensorsafe': 1, 'sensory': 1, 'serializable': 1, 'dai': 1, 'virtualization': 1, 'objectivity': 1, 'gaussian': 1, 'myria': 1, 'phenotype': 1, 'evolutions': 1, 'solve': 1, 'wisely': 1, 'interest': 1, 'quickfoil': 1, 'inductive': 1, 'alarm': 1, 'computationally': 1, 'toronto': 1, 'automate': 1, 'sorts': 1, 'vdlb': 1, 'suppression': 1, 'pixida': 1, 'graphtwist': 1, 'duration': 1, 'hmm': 1, 'nvc': 1, 'boosted': 1, 'lie': 1, 'proposal': 1, 'happens': 1, 'dissecting': 1, 'answerability': 1, 'outline': 1, 'forum': 1, 'responding': 1, 'olxp': 1, 'overcoming': 1, 'great': 1, 'fatigue': 1, 'code': 1, 'managed': 1, 'runtimes': 1, 'population': 1, 'md': 1, 'ties': 1, 'axioms': 1, 'lenses': 1, 'assessing': 1, 'licenses': 1, 'drm': 1, 'plateaus': 1, 'island': 1, 'counterfactual': 1, 'me': 1, 'xmach': 1, 'smoothing': 1, 'packed': 1, 'memcached': 1, 'cohesive': 1, 'rbac': 1, 'homes': 1, 'gmission': 1, 'aruba': 1, 'robot': 1, 'fp': 1, 'bruijn': 1, 'imbalanced': 1, 'feedbackbypass': 1, 'music': 1, 'songs': 1, 'microbenchmark': 1, 'mega': 1, 'kv': 1, 'maximize': 1, 'enhancement': 1, 'asterixdb': 1, 'bdms': 1, 'clusterjoin': 1, 'dtm': 1, 'persistency': 1, 'mmdbms': 1, 'researcher': 1, 'few': 1, 'xoo': 1, 'amplification': 1, 'durability': 1, 'excis': 1, 'adjustable': 1, 'folk': 1, 'developed': 1, 'zoning': 1, 'phishing': 1, 'conjunction': 1, 'jaql': 1, 'scripting': 1, 'interoperating': 1, 'mda': 1, 'seven': 1, 'uda': 1, 'gist': 1, 'unify': 1, 'quantifying': 1, 'eek': 1, 'completing': 1, 'pronto': 1, 'coin': 1, 'giraph': 1, 'unchained': 1, 'barrierless': 1, 'pregel': 1, 'conscience': 1, 'gathering': 1, 'shore': 1, 'em': 1, 'member': 1, 'ensuring': 1, 'printed': 1, 'catalogues': 1, 'matenahzauon': 1, 'recumvely': 1, 'tioga': 1, 'invocation': 1, 'statemachine': 1, 'module': 1, 'staying': 1, 'fit': 1, 'entire': 1, 'xslt': 1, 'rainforest': 1, 'xcheck': 1, 'analyses': 1, 'minicomputer': 1, 'alerter': 1, 'realistic': 1, 'hyperion': 1, 'cads': 1, 'multipurpose': 1, 'cause': 1, 'innovation': 1, 'cockpit': 1, 'wasserman': 1, 'weber': 1, 'sash': 1, 'progressing': 1, 'catching': 1, 'decisive': 1, 'memex': 1, 'surf': 1, 'trails': 1, 'purpors': 1, 'facts': 1, 'red': 1, 'brick': 1, 'vista': 1, 'recoding': 1, 'compensation': 1, 'serviceglobe': 1, 'distributing': 1, 'nesting': 1, 'ns': 1, 'quiet': 1, 'turmoil': 1, 'various': 1, 'ebusiness': 1, 'ymaldb': 1, 'expiring': 1, 'falcon': 1, 'society': 1, 'hux': 1, 'intersystem': 1, 'transfer': 1, 'injection': 1, 'stanford': 1, 'erhard': 1, 'rahm': 1, 'closures': 1, 'vod': 1, 'hmap': 1, 'sentences': 1, 'denodo': 1, 'interfacing': 1, 'leading': 1, 'facing': 1, 'hisbase': 1, 'lb': 1, 'keogh': 1, 'supports': 1, 'rotation': 1, 'invariance': 1, 'redundancies': 1, 'rtmonitor': 1, 'fastest': 1, 'crimson': 1, 'phylogenetic': 1, 'stdl': 1, 'objectglobe': 1, 'enough': 1, 'easily': 1, 'define': 1, 'replicas': 1, 'middle': 1, 'cactis': 1, 'weighting': 1, 'lanaguage': 1, 'interpretation': 1, 'ghost': 1, 'fraud': 1, 'spreading': 1, 'sdc': 1, 'ellipsoid': 1, 'cirqul': 1, 'fixpoint': 1, 'grown': 1, 'blogscope': 1, 'rum': 1, 'memos': 1, 'tab': 1, 'totals': 1, 'primal': 1, 'hierarchic': 1, 'thm': 1, 'berlinmod': 1, 'sprint': 1, 'critical': 1, 'adiba': 1, 'al': 1, 'intemon': 1, 'dependability': 1, 'iso': 1, 'tc': 1, 'sc': 1, 'wg': 1, 'momis': 1, 'jects': 1, 'realization': 1, 'dvss': 1, 'lopix': 1, 'actual': 1, 'acquired': 1, 'memoriam': 1, 'klaus': 1, 'dittrich': 1, 'pmg': 1, 'weights': 1, 'precomputed': 1, 'continued': 1, 'saga': 1, 'wring': 1, 'dry': 1, 'eev': 1, 'engmeenng': 1, 'ew': 1, 'sibling': 1, 'natix': 1, 'workspace': 1, 'ambiguity': 1, 'tdms': 1, 'philosophies': 1, 'schneider': 1, 'xxl': 1, 'implementations': 1, 'subject': 1, 'shuffling': 1, 'stacked': 1, 'deck': 1, 'foreign': 1, 'implication': 1, 'netbook': 1, 'augmented': 1, 'specialized': 1, 'efficacious': 1, 'together': 1, 'galax': 1, 'rrxs': 1, 'datatype': 1, 'lexicons': 1, 'surprising': 1, 'carbon': 1, 'nfnf': 1, 'educational': 1, 'governmental': 1, 'os': 1, 'unix': 1, 'awsom': 1, 'coupling': 1, 'prolog': 1, 'recursively': 1, 'multisocket': 1, 'unrolling': 1, 'cycles': 1, 'decide': 1, 'trigger': 1, 'lapses': 1, 'diagrama': 1, 'america': 1, 'advantage': 1, 'mobilizing': 1, 'genericity': 1, 'roadrunner': 1, 'classifiers': 1, 'proving': 1, 'websemantics': 1, 'psycho': 1, 'zoom': 1, 'userviews': 1, 'impure': 1, 'surrogates': 1, 'mangers': 1, 'atomic': 1, 'abduction': 1, 'delivering': 1, 'presentations': 1, 'hbp': 1, 'thesus': 1, 'conjuncts': 1, 'lsd': 1, 'universe': 1, 'ranker': 1, 'apers': 1, 'gray': 1, 'stanley': 1, 'cim': 1, 'mapinfo': 1, 'spatialware': 1, 'datacubes': 1, 'denotational': 1, 'drc': 1, 'alpha': 1, 'integrates': 1, 'svt': 1, 'gbase': 1, 'evaluability': 1, 'consideration': 1, 'necessarily': 1, 'projecting': 1, 'simplified': 1, 'valid': 1, 'yehoshua': 1, 'sagiv': 1, 'mirroring': 1, 'alternating': 1, 'descriptions': 1, 'dpls': 1, 'pol': 1, 'webfilter': 1, 'xnf': 1, 'five': 1, 'deconstructing': 1, 'multiplayer': 1, 'playing': 1, 'games': 1, 'cms': 1, 'rss': 1, 'hern': 1, 'ndez': 1, 'uses': 1, 'save': 1, 'further': 1, 'stratifiable': 1, 'achievement': 1, 'analogical': 1, 'externally': 1, 'nonnumeric': 1, 'orientation': 1, 'pdis': 1, 'multifeature': 1, 'ambientdb': 1, 'indegs': 1, 'supported': 1, 'cfd': 1, 'postprocessing': 1, 'unsolved': 1, 'lachesis': 1, 'conquer': 1, 'excel': 1, 'spirit': 1, 'watchman': 1, 'pictorial': 1, 'benefits': 1, 'sliced': 1, 'idm': 1, 'dataspace': 1, 'while': 1, 'sync': 1, 'contained': 1, 'fist': 1, 'sequencing': 1, 'terabit': 1, 'desired': 1, 'nanotechnology': 1, 'iq': 1, 'multiplex': 1, 'designed': 1, 'metl': 1, 'germ': 1, 'cares': 1, 'udb': 1, 'ego': 1, 'teach': 1, 'watch': 1, 'axis': 1, 'kms': 1, 'decompression': 1, 'assembly': 1, 'leo': 1, 'biotech': 1, 'filing': 1, 'hisa': 1, 'wic': 1, 'indez': 1, 'syntems': 1, 'datamodel': 1, 'svm': 1, 'removing': 1, 'barriers': 1, 'widespread': 1, 'adoption': 1, 'datascope': 1, 'contents': 1, 'omni': 1, 'juggling': 1, 'feathers': 1, 'bowling': 1, 'balls': 1, 'breaking': 1, 'multiclass': 1, 'goals': 1, 'whither': 1, 'xsl': 1, 'impacts': 1, 'covering': 1, 'shrinking': 1, 'resiliency': 1, 'variations': 1, 'superimposed': 1, 'coding': 1, 'descriptor': 1, 'amie': 1, 'hints': 1, 'journals': 1, 'extinction': 1, 'explosion': 1, 'transducer': 1, 'netweaver': 1, 'bi': 1, 'cyber': 1, 'hear': 1, 'scream': 1, 'propel': 1, 'hypermedia': 1, 'notifications': 1, 'swiss': 1, 'idbd': 1, 'limiting': 1, 'porel': 1, 'inhomogeneous': 1, 'guaranteed': 1, 'stinger': 1, 'oprarruzataon': 1, 'paranoids': 1, 'viztree': 1, 'visually': 1, 'diabetic': 1, 'bilvideo': 1, 'multiuser': 1, 'streamglobe': 1, 'sqlb': 1, 'consumers': 1, 'adabas': 1, 'strongly': 1, 'typed': 1, 'hos': 1, 'miner': 1, 'outlyting': 1, 'starting': 1, 'sometimes': 1, 'ending': 1, 'company': 1, 'practicioner': 1, 'simply': 1, 'converter': 1, 'worlds': 1, 'magnetic': 1, 'cords': 1, 'rotating': 1, 'sytems': 1, 'vipas': 1, 'sox': 1, 'subtype': 1, 'maybms': 1, 'gorder': 1, 'rollups': 1, 'celljoin': 1, 'pairing': 1, 'voip': 1, 'conversations': 1, 'adjustment': 1, 'ndb': 1, 'inegration': 1, 'laura': 1, 'her': 1, 'remit': 1, 'paraphrasing': 1, 'upstream': 1, 'pier': 1, 'caprera': 1, 'commons': 1, 'heartbeat': 1, 'gigascope': 1, 'taxis': 1, 'quadtrees': 1, 'xpathlearner': 1, 'phd': 1, 'located': 1, 'berlin': 1, 'pm': 1, 'orthogonal': 1, 'mix': 1, 'conformity': 1, 'term': 1, 'authentic': 1, 'lh': 1, 'fido': 1, 'learns': 1, 'fetch': 1, 'anyone': 1, 'contest': 1, 'hcc': 1, 'smartcard': 1, 'unstoppable': 1, 'stateful': 1, 'php': 1, 'compensating': 1, 'quble': 1, 'blending': 1, 'prtv': 1, 'increasing': 1, 'parity': 1, 'acceptable': 1, 'agencies': 1, 'shapelets': 1, 'pgstgres': 1, 'variables': 1, 'facilitating': 1, 'perpetual': 1, 'reformulations': 1, 'laguna': 1, 'waters': 1, 'generations': 1, 'visible': 1, 'entrans': 1, 'jafar': 1, 'adibi': 1, 'minining': 1, 'draft': 1, 'joint': 1, 'jasmin': 1, 'sourced': 1, 'inserts': 1, 'gain': 1, 'profitably': 1, 'tu': 1, 'downfall': 1, 'empire': 1, 'requires': 1, 'ranksql': 1, 'implantation': 1, 'stepwise': 1, 'tras': 1, 'ducer': 1, 'fix': 1, 'needed': 1, 'predicalc': 1, 'replica': 1, 'inequality': 1, 'behavioural': 1, 'annevelink': 1, 'objectoriented': 1, 'cql': 1, 'observations': 1, 'invertible': 1, 'late': 1, 'unscalable': 1, 'contextual': 1, 'insight': 1, 'arena': 1, 'seeking': 1, 'stable': 1, 'blogosphere': 1, 'pasta': 1, 'compositional': 1, 'avi': 1, 'pfeffer': 1, 'proactivity': 1, 'interactiveness': 1, 'andes': 1, 'asymptotically': 1, 'once': 1, 'facekit': 1, 'mudular': 1, 'possibly': 1, 'uncooperative': 1, 'welcome': 1, 'chairs': 1, 'switching': 1, 'ontoquest': 1, 'reduced': 1, 'seoul': 1, 'irisnet': 1, 'unsupervised': 1, 'causal': 1, 'imemex': 1, 'escapes': 1, 'jungle': 1, 'pilot': 1, 'executive': 1, 'umfied': 1, 'translator': 1, 'dialog': 1, 'perfect': 1, 'indicator': 1, 'interpreter': 1, 'toto': 1, 'kansas': 1, 'anymore': 1, 'transitioning': 1, 'talk': 1, 'relationa': 1, 'fas': 1, 'aerospace': 1, 'dy': 1, 'namic': 1, 'interac': 1, 'tive': 1, 'tandem': 1, 'serverware': 1, 'polygon': 1, 'aqua': 1, 'passenger': 1, 'idea': 1, 'irrelevant': 1, 'autonomously': 1, 'computable': 1, 'variant': 1, 'psoup': 1, 'shrex': 1, 'parallelization': 1, 'options': 1, 'debugger': 1, 'loadstar': 1, 'immersive': 1, 'analyzer': 1, 'moshe': 1, 'shadmon': 1, 'vlkdbs': 1, 'safety': 1, 'punctuated': 1, 'trichotomy': 1, 'merlin': 1, 'xqueries': 1, 'simultaneously': 1, 'rateweb': 1, 'establishment': 1, 'ada': 1, 'qualified': 1, 'reflect': 1, 'ub': 1, 'aditi': 1, 'bubbles': 1, 'visweb': 1, 'inexpensive': 1, 'will': 1, 'revolutionize': 1, 'bellcore': 1, 'harness': 1, 'intranets': 1, 'grabber': 1, 'curious': 1, 'buddies': 1, 'tpr': 1, 'rp': 1, 'provide': 1, 'recording': 1, 'voice': 1, 'innovative': 1, 'noise': 1, 'corpus': 1, 'segmentation': 1, 'skyframe': 1, 'infoquilt': 1, 'branching': 1, 'prefiltering': 1, 'traffi': 1, 'bellwether': 1, 'ease': 1, 'pmr': 1, 'italian': 1, 'card': 1, 'bounding': 1, 'coloring': 1, 'total': 1, 'komagome': 1, 'metropolitan': 1, 'champagne': 1, 'samplinglarge': 1, 'unibase': 1, 'cs': 1, 'infological': 1, 'superjoin': 1, 'ngram': 1, 'encoded': 1, 'lgedbms': 1, 'argument': 1, 'microcomputer': 1, 'deliver': 1, 'orleans': 1, 'proliant': 1, 'datbase': 1, 'descriptive': 1, 'bio': 1, 'accesses': 1, 'updated': 1, 'finance': 1, 'gmine': 1, 'there': 1, 'hope': 1, 'consultable': 1, 'sit': 1, 'amazon': 1, 'com': 1, 'minimality': 1, 'sw': 1, 'clotho': 1, 'decoupling': 1, 'layout': 1, 'retmeval': 1, 'affinity': 1, 'layouts': 1, 'picshark': 1, 'scarcity': 1, 'collaboration': 1, 'thresholds': 1, 'pccp': 1, 'interoperation': 1, 'moby': 1, 'implicit': 1, 'associated': 1, 'quintillabit': 1, 'hyperlarge': 1, 'responsive': 1, 'vimsys': 1, 'hans': 1, 'archis': 1, 'sepia': 1, 'selectivities': 1, 'anipqo': 1, 'intrusive': 1, 'keynote': 1, 'laboratory': 1, 'jasmine': 1, 'radixzip': 1, 'token': 1, 'including': 1, 'prioritized': 1, 'projects': 1, 'worlinfo': 1, 'ecrins': 1, 'faulting': 1, 'citizens': 1, 'risc': 1, 'organized': 1, 'vm': 1, 'lifting': 1, 'burden': 1, 'solver': 1, 'selec': 1, 'tion': 1, 'push': 1, 'pr': 1, 'cis': 1, 'alert': 1, 'passive': 1, 'hotspots': 1, 'find': 1, 'sor': 1, 'tribeca': 1, 'gradients': 1, 'participants': 1, 'walks': 1, 'piphany': 1, 'epicenter': 1, 'juxtaposed': 1, 'entirely': 1, 'abridged': 1, 'approaching': 1, 'automed': 1, 'hetereogeneous': 1, 'partnership': 1, 'lsn': 1, 'dwms': 1, 'telidon': 1, 'videotex': 1, 'metabase': 1, 'compressing': 1, 'rehist': 1, 'coalescing': 1, 'six': 1, 'remembrance': 1, 'overload': 1, 'archived': 1, 'escrow': 1, 'disjunctions': 1, 'iterators': 1, 'klee': 1, 'duplicated': 1, 'poesia': 1, 'agriculture': 1, 'datawarehousing': 1, 'colours': 1, 'eliminate': 1, 'spurious': 1, 'indexable': 1, 'pla': 1, 'legodb': 1, 'rebound': 1, 'sorter': 1, 'sophlst': 1, 'cated': 1, 'ofs': 1, 'percentile': 1, 'review': 1, 'rubicon': 1, 'possibilistic': 1, 'retrospective': 1, 'ambient': 1, 'target': 1, 'acs': 1, 'minicon': 1, 'datalanguage': 1, 'bureau': 1, 'prdb': 1, 'ase': 1, 'move': 1, 'mediate': 1, 'investigation': 1, 'clustra': 1, 'telecom': 1, 'positive': 1, 'sciport': 1, 'fibonacci': 1, 'parsimonious': 1, 'multiattribute': 1, 'coma': 1, 'immunisation': 1, 'metaphysical': 1, 'decomposing': 1, 'ddb': 1, 'rufus': 1, 'bidirectional': 1, 'xrpc': 1, 'interoperable': 1, 'proquel': 1, 'correctly': 1, 'configuring': 1, 'versioned': 1, 'damia': 1, 'fabric': 1, 'simultaneous': 1, 'multithreading': 1, 'declare': 1, 'sds': 1, 'efforts': 1, 'commercialize': 1, 'multisets': 1, 'outerjoins': 1, 'pull': 1, 'erratic': 1, 'gignomda': 1, 'rolex': 1, 'navigable': 1, 'quickstore': 1, 'mapped': 1, 'bibfinder': 1, 'statminer': 1, 'overlap': 1, 'swering': 1, 'elevation': 1, 'rdfi': 1, 'contexts': 1, 'fragmented': 1, 'fascicles': 1, 'acychc': 1, 'appli': 1, 'cations': 1, 'stories': 1, 'itrails': 1, 'dataspaces': 1, 'crash': 1, 'containing': 1, 'compressive': 1, 'geromesuite': 1, 'inspired': 1, 'considering': 1, 'quantifiers': 1, 'pop': 1, 'fed': 1, 'adsm': 1, 'imagery': 1, 'referencial': 1, 'multivariate': 1, 'sing': 1, 'inverses': 1, 'disco': 1, 'mis': 1, 'shift': 1, 'register': 1, 'preorder': 1, 'postorder': 1, 'filterings': 1, 'spatially': 1, 'usage': 1, 'proclamation': 1, 'telcordia': 1, 'costing': 1, 'strict': 1, 'wrapping': 1, 'synchronized': 1, 'succinct': 1, 'amit': 1, 'odmg': 1, 'sophisticated': 1, 'bypassing': 1, 'microbial': 1, 'img': 1, 'question': 1, 'correspondences': 1, 'breadth': 1, 'traversal': 1, 'opossum': 1, 'desk': 1, 'glue': 1, 'nail': 1, 'intrinsic': 1, 'generalised': 1, 'projected': 1, 'socratic': 1, 'exegesis': 1, 'intertask': 1, 'eniam': 1, 'discriminants': 1, 'union': 1, 'balke': 1, 'gembase': 1, 'brain': 1, 'adms': 1, 'mainframe': 1, 'hosts': 1, 'sq': 1, 'practitioner': 1, 'flowcube': 1, 'flowcubes': 1, 'position': 1, 'regional': 1, 'axiomatic': 1, 'choices': 1, 'viewers': 1, 'analogue': 1, 'procedure': 1, 'calls': 1, 'landscape': 1, 'evaluations': 1, 'prima': 1, 'attachment': 1, 'lockprotocols': 1, 'centralized': 1, 'mysql': 1, 'old': 1, 'suite': 1, 'lexicon': 1, 'eddies': 1, 'sofis': 1, 'raghunath': 1, 'othayoth': 1, 'proli': 1, 'ant': 1, 'syst': 1, 'maintainable': 1, 'lookups': 1, 'vmldb': 1, 'many': 1, 'heterogenity': 1, 'disnic': 1, 'nicnet': 1, 'accounts': 1, 'occurrence': 1, 'frequencies': 1, 'stand': 1, 'alone': 1, 'increasingly': 1, 'offload': 1, 'blackboard': 1, 'etuner': 1, 'protdb': 1, 'ensembling': 1, 'netherlands': 1, 'ptt': 1, 'itcis': 1, 'university': 1, 'sweeping': 1, 'agora': 1, 'living': 1, 'xseek': 1, 'pseudo': 1, 'schems': 1, 'topaz': 1, 'parallelizer': 1, 'io': 1, 'coherence': 1, 'controls': 1, 'biopatentminer': 1, 'patents': 1, 'multiprogramming': 1, 'baton': 1, 'alphasort': 1, 'hyperfile': 1, 'invalidation': 1, 'multitiered': 1, 'constantly': 1, 'kaleidoscope': 1, 'nonrecursive': 1, 'enthusiastic': 1, 'rather': 1, 'wave': 1, 'websources': 1, 'lcs': 1, 'trim': 1, 'fusem': 1, 'statstream': 1, 'thousands': 1, 'referee': 1, 'researchindex': 1, 'constant': 1, 'permutation': 1, 'gridvine': 1, 'globally': 1, 'kvl': 1, 'multiaction': 1, 'ilog': 1, 'polygen': 1, 'disconnection': 1, 'dashboard': 1, 'green': 1, 'overheads': 1, 'recycling': 1, 'tlashmg': 1, 'enduser': 1, 'guidance': 1, 'icl': 1, 'filestore': 1, 'grail': 1, 'clause': 1, 'udl': 1, 'faceted': 1, 'archaeology': 1, 'lindex': 1, 'checks': 1, 'balances': 1, 'impedance': 1, 'significance': 1, 'swoosh': 1, 'myportal': 1, 'gte': 1, 'superpages': 1, 'fittest': 1, 'survives': 1, 'georgraphic': 1, 'dataguides': 1, 'formulation': 1, 'returning': 1, 'rows': 1, 'statements': 1, 'neurorule': 1, 'connectionist': 1, 'recommending': 1, 'physiological': 1, 'compressibility': 1, 'weaving': 1, 'xbenchmatch': 1, 'certification': 1, 'timestamps': 1, 'realities': 1, 'zoo': 1, 'xcerpt': 1, 'visxcerpt': 1, 'checkpointing': 1, 'those': 1, 'weird': 1, 'want': 1, 'anyway': 1, 'fluxcapacitor': 1, 'travel': 1, 'stratosphere': 1, 'wmxml': 1, 'afilter': 1, 'nationwide': 1, 'aquery': 1, 'backout': 1, 'cacheportal': 1, 'award': 1, 'aggregators': 1, 'chicago': 1, 'synergistic': 1, 'assurance': 1, 'evidential': 1, 'ac': 1, 'quisition': 1, 'infr': 1, 'groupware': 1, 'lotus': 1, 'domino': 1, 'notes': 1, 'approximations': 1, 'ariel': 1, 'front': 1, 'irrd': 1, 'portugal': 1, 'viator': 1, 'dtd': 1, 'directed': 1, 'gaea': 1, 'literature': 1, 'anorexic': 1, 'score': 1, 'bad': 1, 'ugly': 1, 'mund': 1, 'wildcard': 1, 'left': 1, 'maintain': 1, 'usind': 1, 'unveiling': 1, 'built': 1, 'dewey': 1, 'gmap': 1, 'due': 1, 'charge': 1, 'meter': 1, 'cellsort': 1, 'functionality': 1, 'streamminer': 1, 'ensemble': 1, 'mine': 1, 'drifting': 1, 'geographic': 1, 'roles': 1, 'mls': 1, 'linkclus': 1, 'links': 1, 'personalizing': 1, 'piment': 1, 'coping': 1, 'responses': 1, 'accommodating': 1, 'refining': 1, 'consensus': 1, 'dr': 1, 'qi': 1, 'centralization': 1, 'nile': 1, 'pdt': 1, 'phenomenon': 1, 'environmental': 1, 'dtl': 1, 'dataspot': 1, 'plain': 1, 'pivot': 1, 'unpivot': 1, 'triggered': 1, 'deadline': 1, 'bt': 1, 'branched': 1, 'fourier': 1, 'instantiating': 1, 'graphdb': 1, 'vp': 1, 'color': 1, 'illuminating': 1, 'dark': 1, 'optimizable': 1, 'rollback': 1, 'meikel': 1, 'poess': 1, 'seat': 1, 'reservation': 1, 'japanese': 1, 'railways': 1, 'voting': 1, 'sparcom': 1, 'pi': 1, 'datajoiner': 1, 'deferring': 1, 'lines': 1, 'voodb': 1, 'actiview': 1, 'supersql': 1, 'mist': 1, 'mode': 1, 'eroc': 1, 'neato': 1, 'casper': 1, 'compromising': 1, 'gateways': 1, 'place': 1, 'involving': 1, 'easier': 1, 'thought': 1, 'syntax': 1, 'cmd': 1, 'reloaded': 1, 'dialogue': 1, 'oasis': 1, 'xquec': 1, 'thrashing': 1, 'realm': 1, 'modularity': 1, 'teenage': 1, 'fenecia': 1, 'endurable': 1, 'incorporated': 1, 'sole': 1, 'compacted': 1, 'remora': 1, 'centered': 1, 'unloading': 1, 'terabyte': 1, 'www': 1, 'heptox': 1, 'marrying': 1, 'lru': 1, 'buffers': 1, 'snapshots': 1, 'backup': 1, 'ipac': 1, 'nuits': 1, 'generators': 1, 'neighbour': 1, 'hifi': 1, 'fan': 1, 'safely': 1, 'references': 1, 'websites': 1, 'datablitz': 1, 'clio': 1, 'conecptual': 1, 'predator': 1, 'request': 1, 'demarcation': 1, 'auxiliary': 1, 'movable': 1, 'head': 1, 'denormalized': 1, 'flat': 1, 'iml': 1, 'inscribed': 1, 'nets': 1, 'harvesting': 1, 'rainbow': 1, 'classroom': 1, 'education': 1, 'resisting': 1, 'anonymized': 1, 'retrievals': 1, 'clv': 1, 'format': 1, 'undirected': 1, 'capture': 1, 'datadict': 1, 'deviants': 1, 'ncr': 1, 'hubble': 1, 'folder': 1, 'bag': 1, 'datalogneg': 1, 'traversals': 1, 'ghostdb': 1, 'hiding': 1, 'prying': 1, 'exponential': 1, 'dictionaries': 1, 'trac': 1, 'reporting': 1, 'bp': 1, 'ql': 1, 'semijoin': 1, 'matter': 1, 'landscapes': 1, 'mountains': 1, 'gazing': 1, 'demonstrations': 1, 'caches': 1, 'holes': 1, 'chronological': 1, 'writesets': 1, 'insertion': 1, 'xqrl': 1, 'disjoint': 1, 'authorizations': 1, 'transition': 1, 'preaggregation': 1, 'clusteredness': 1, 'explicit': 1, 'mv': 1, 'infinite': 1, 'restarting': 1, 'minerva': 1, 'groupwise': 1, 'serve': 1, 'welfare': 1, 'sideway': 1, 'obk': 1, 'bongki': 1, 'moon': 1, 'alliances': 1, 'epoch': 1, 'disaster': 1, 'comparable': 1, 'oxpath': 1, 'exclusive': 1, 'installations': 1, 'spreadsheets': 1, 'bottlenecks': 1, 'bonananzas': 1, 'ranks': 1, 'dist': 1, 'movie': 1, 'patient': 1, 'steroids': 1, 'lies': 1, 'fad': 1, 'tavant': 1, 'sell': 1, 'suppressions': 1, 'sgml': 1, 'manufacturing': 1, 'ws': 1, 'catalognet': 1, 'peering': 1, 'zigzag': 1, 'presumed': 1, 'peterlee': 1, 'reflections': 1, 'substitution': 1, 'riding': 1, 'forwarding': 1, 'atomicity': 1, 'banks': 1, 'krisys': 1, 'loops': 1, 'cb': 1, 'treescape': 1, 'computed': 1, 'irregular': 1, 'naos': 1, 'reactive': 1, 'fde': 1, 'referencing': 1, 'neuroscience': 1, 'globalization': 1, 'clustream': 1, 'gcx': 1, 'reefs': 1, 'hyperqueries': 1, 'psj': 1, 'sharc': 1, 'qql': 1, 'primitive': 1, 'srikant': 1, 'cubetree': 1, 'si': 1, 'undecomposable': 1, 'forget': 1, 'ois': 1, 'mariposa': 1, 'schedules': 1, 'basket': 1, 'kbms': 1, 'gnatdb': 1, 'footprint': 1, 'accurately': 1, 'sd': 1, 'rtree': 1, 'dynamical': 1, 'arbitrarily': 1, 'rotated': 1, 'symbolic': 1, 'conventional': 1, 'entityrank': 1, 'directly': 1, 'holistically': 1, 'gordian': 1, 'keys': 1, 'ski': 1, 'knowledgeable': 1, 'striptease': 1, 'distributive': 1, 'semantice': 1, 'torsten': 1, 'suel': 1, 'wishful': 1, 'thinking': 1, 'viable': 1, 'provisions': 1, 'obligations': 1, 'ram': 1, 'inlining': 1, 'structurally': 1, 'smooth': 1, 'sections': 1, 'category': 1, 'usable': 1, 'menu': 1, 'aqax': 1, 'crossroads': 1, 'psychological': 1, 'twigs': 1, 'subramanian': 1, 'dimensionally': 1, 'mind': 1, 'grammar': 1, 'accounting': 1, 'row': 1, 'xiss': 1, 'workspaces': 1, 'optimizied': 1, 'fluxquery': 1, 'vsam': 1, 'fsle': 1, 'eztens': 1, 'ble': 1, 'uv': 1, 'claro': 1, 'mediators': 1, 'violation': 1, 'multiplicities': 1, 'slice': 1, 'agflow': 1, 'triples': 1, 'bigsur': 1, 'unlimited': 1, 'quantities': 1, 'alias': 1, 'led': 1, 'deduplication': 1, 'accessibility': 1, 'adn': 1, 'pesto': 1, 'measured': 1, 'biased': 1, 'paragrab': 1, 'uniformity': 1, 'workarounds': 1, 'date': 1, 'diagnostics': 1, 'grape': 1, 'osiris': 1, 'present': 1, 'icicles': 1, 'fertile': 1, 'hole': 1, 'sparsity': 1, 'bubble': 1, 'monet': 1, 'extenders': 1, 'phantom': 1, 'predicative': 1, 'scalablity': 1, 'quaternary': 1, 'chameleon': 1, 'extender': 1, 'timely': 1, 'updatability': 1, 'conductance': 1, 'seaweed': 1, 'utilizations': 1, 'classical': 1, 'braking': 1, 'shifted': 1, 'transformational': 1, 'segmentations': 1, 'huge': 1, 'summary': 1, 'compass': 1, 'html': 1, 'mindreader': 1, 'bloomba': 1, 'scrap': 1, 'compilers': 1, 'orientstore': 1, 'intuitive': 1, 'normalize': 1, 'divide': 1, 'developers': 1, 'ap': 1, 'proaches': 1, 'reconciling': 1, 'koda': 1, 'erratum': 1, 'odbmss': 1, 'au': 1, 'tomated': 1, 'navigations': 1, 'optimally': 1, 'placing': 1, 'orientx': 1, 'windows': 1, 'important': 1, 'holism': 1, 'whiz': 1, 'multifractals': 1, 'trio': 1, 'odefs': 1, 'skate': 1, 'y': 1, 'deelustering': 1, 'atabase': 1, 'universality': 1, 'offering': 1, 'designer': 1, 'processable': 1, 'molap': 1, 'complements': 1, 'infused': 1, 'synergy': 1, 'maintainance': 1, 'multisource': 1, 'hypervideo': 1, 'connection': 1, 'propagations': 1, 'bufering': 1, 'encompass': 1, 'consequences': 1, 'smoqe': 1, 'imperfect': 1, 'qfilter': 1, 'insecure': 1, 'ones': 1, 'sending': 1, 'dsis': 1, 'interrelational': 1, 'debugging': 1, 'section': 1, 'viewpoint': 1, 'multilingualism': 1, 'paradise': 1, 'digest': 1, 'callassist': 1, 'helping': 1, 'agents': 1, 'uldbs': 1, 'morphing': 1, 'formalisms': 1, 'accessses': 1, 'vxmlr': 1, 'trax': 1, 'substitutability': 1, 'published': 1, 'cryptography': 1, 'discover': 1, 'ltering': 1, 'docu': 1, 'ments': 1, 'queues': 1, 'decoding': 1, 'melodies': 1, 'renaissance': 1, 'polyphony': 1, 'john': 1, 'funderburk': 1, 'warlock': 1, 'atlas': 1, 'but': 1, 'texture': 1, 'template': 1, 'inter': 1, 'customization': 1, 'combinatorial': 1, 'curio': 1, 'raster': 1, 'satellite': 1, 'remusdb': 1, 'plug': 1, 'play': 1, 'preparation': 1, 'constructors': 1, 'gpx': 1, 'reason': 1, 'walking': 1, 'refined': 1, 'hyperstorm': 1, 'combating': 1, 'trustrank': 1, 'asera': 1, 'extranet': 1, 'surrounder': 1, 'geographical': 1, 'scopes': 1, 'vgram': 1, 'phantoms': 1, 'spheresearch': 1, 'bhunt': 1, 'stairs': 1, 'dht': 1, 'concerning': 1, 'governors': 1, 'destaging': 1, 'siren': 1, 'cxhist': 1, 'automotive': 1, 'randomizing': 1, 'protecting': 1, 'compromise': 1, 'curve': 1, 'fitting': 1, 'workfile': 1, 'mergesorts': 1, 'deletes': 1, 'rqa': 1, 'fqi': 1, 'restricting': 1, 'formula': 1, 'criterion': 1, 'sting': 1, 'offline': 1, 'boundary': 1, 'mandatory': 1, 'dawn': 1, 'dct': 1, 'fragmentation': 1, 'menory': 1, 'eufid': 1, 'methodologic': 1, 'instalation': 1, 'brazil': 1, 'segments': 1, 'command': 1, 'hits': 1, 'grna': 1, 'programmable': 1, 'genomics': 1, 'opt': 1, 'gloss': 1, 'broker': 1, 'millennium': 1, 'mrp': 1, 'reed': 1, 'orthogonally': 1, 'cape': 1, 'ida': 1, 'uqlips': 1, 'cam': 1, 'masking': 1, 'accelerated': 1, 'dip': 1, 'suppressing': 1, 'prevent': 1, 'netcube': 1, 'multiversionxml': 1, 'dms': 1, 'uload': 1, 'differences': 1, 'institutions': 1, 'japan': 1, 'acme': 1, 'step': 1, 'realizing': 1, 'rendezvous': 1, 'casual': 1, 'freeform': 1, 'emerging': 1, 'baes': 1, 'conservation': 1, 'immediate': 1, 'focus': 1, 'informed': 1, 'insite': 1, 'dbgraph': 1, 'cnn': 1, 'semantlc': 1, 'optlmlzatlon': 1, 'relatlonal': 1, 'bdual': 1, 'filling': 1, 'curves': 1, 'exception': 1, 'interviso': 1, 'cure': 1, 'sccs': 1, 'advances': 1, 'formalism': 1, 'xprs': 1, 'icot': 1, 'medoids': 1, 'nexusscout': 1, 'unification': 1, 'multilevel': 1, 'probe': 1, 'accessible': 1, 'tigukat': 1, 'objectbase': 1, 'cached': 1, 'multiscale': 1, 'formalization': 1, 'ims': 1, 'threading': 1, 'performing': 1, 'deletions': 1, 'straight': 1, 'linux': 1, 'transposition': 1, 'conversational': 1, 'dan': 1, 'suciu': 1, 'commutativity': 1, 'mars': 1, 'undo': 1, 'diag': 1, 'unql': 1, 'dbs': 1, 'combi': 1, 'rajeev': 1, 'rastogi': 1, 'expres': 1, 'sions': 1}
In [7]:
freq=list(voc_sorted.values())
print(freq)
[1343, 1078, 1070, 1010, 984, 945, 750, 517, 455, 397, 347, 323, 323, 309, 293, 280, 279, 250, 240, 229, 218, 201, 200, 200, 189, 171, 170, 159, 143, 142, 131, 127, 127, 118, 117, 117, 113, 112, 101, 101, 95, 95, 94, 93, 92, 91, 89, 87, 85, 85, 83, 83, 80, 80, 79, 79, 78, 78, 77, 77, 76, 74, 74, 72, 72, 72, 71, 70, 68, 68, 68, 67, 65, 65, 65, 63, 63, 63, 63, 62, 61, 60, 60, 60, 59, 59, 59, 56, 56, 55, 54, 53, 52, 51, 51, 50, 50, 49, 49, 48, 48, 48, 48, 47, 45, 45, 45, 45, 45, 45, 45, 44, 44, 44, 43, 42, 42, 41, 41, 41, 41, 41, 40, 40, 39, 39, 39, 38, 38, 38, 37, 37, 37, 37, 36, 36, 36, 36, 35, 35, 35, 35, 34, 34, 34, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Heaps'_Law

How big is the term vocabulary?

How many distinct words are there? In theory, the vocabulary size can be very large if we gauge from the number of combinations of letters. Suppose that there are 26 letters exclusing the variation of upper and lower cases, and suppose that a word can be 20 letters long. Then, at least there are $26^{20}$ different words. It is a extremely big number.

In reality, the vocabulary size $V$ is constrained with a corpus. The vocabulary will keep growing with collection size $T$. We know that $V<T$. But how much smaller is $V$?

Heaps' law:

$$ V = k T^b $$

Typical values for the parameters $k$ and $b$ are: $30 \leq k \leq 100$ and $b\approx 0.5$.

Heaps' law is linear in log-log space.It is the simplest possible relationship between collection size and vocabulary size in log-log space.

In [8]:
T=sum(freq) # text length
V=len(freq) # vocabulary size
k=V/ T**0.5
print('k is '+ str(k) )
print(T)
print(V)
k is 24.12405997331913
36091
4583

Zipf's_law

Now we have characterized the growth of the vocabulary in collections. We also want to know how many frequent vs. infrequent terms we should expect in a collection. In natural language, there are a few very frequent terms and very many very rare terms.

Zipf's law: The $i th$ most frequent term has frequency $f_i$ that is proportional to $1/i$.

\begin{align} f_i \propto \frac{1}{i} \end{align}

$f_i$ is the frequency i-th most popular word. An intuitive interpretation of the Zipf's law is that for each word, if we multiply its ranking by its its frequency, the result is a constant.

Zipf's law work well for terms in the middle section. For popular words, it does not fit very well. The slope is normmally smaller then the middle section.

If a data fits Zipf's law perfectly, $f_2$ is half of $f_1$, and $f_3$ is the third of $f_1$. But normally the top a few words do not fit the line well as we see in the plot.

In [9]:
import matplotlib.pyplot as plt
import numpy as np

word_rank=np.array(range(len(freq)))+1
plt.scatter(word_rank,freq)
plt.title("Zipf's law")
plt.ylabel("Number of Occurrences")
plt.xlabel("Rank of words") 
plt.show()
<Figure size 640x480 with 1 Axes>

loglog plot

This plot does not give us much information: it is almost 90 degree angle, with to frequencies drop rapidly. Then, most of the remaining words line up almost in a straight line.

When we take log on both sides of $ f_i = c i^{-1}$, we have

$$ \log f_i = \log c - \log i $$

Hence the loglog plot shows almost a stright line.

In [10]:
plt.loglog(word_rank,freq, marker='o', linestyle='None')
X=word_rank
k=3500
Y=k/X
plt.plot(X,Y)
plt.title("Zipf's law")
plt.ylabel("Number of Occurrences")
plt.xlabel("Rank of words")
plt.show()

Now we can tell the frequencies of the top words. For example, the top word $for$, occurs approximately a little more than 1000 times. The frequency of the next word is around 1000 times.

$$ y=\frac{k}{x} $$

Frequency of Frequency

How many rare words are there? More especifically, how many words occur only once? There is a term for such words--they are called hapax legomena. How many words occur twice (called dis legomena)? Is there a relation (like ratio) between those two kinds of words? Is there a rule we can apply to deduce such counts?

In [11]:
hist = {}

for i in freq:
    hist[i] = hist.get(i, 0) + 1
print(hist)
{1343: 1, 1078: 1, 1070: 1, 1010: 1, 984: 1, 945: 1, 750: 1, 517: 1, 455: 1, 397: 1, 347: 1, 323: 2, 309: 1, 293: 1, 280: 1, 279: 1, 250: 1, 240: 1, 229: 1, 218: 1, 201: 1, 200: 2, 189: 1, 171: 1, 170: 1, 159: 1, 143: 1, 142: 1, 131: 1, 127: 2, 118: 1, 117: 2, 113: 1, 112: 1, 101: 2, 95: 2, 94: 1, 93: 1, 92: 1, 91: 1, 89: 1, 87: 1, 85: 2, 83: 2, 80: 2, 79: 2, 78: 2, 77: 2, 76: 1, 74: 2, 72: 3, 71: 1, 70: 1, 68: 3, 67: 1, 65: 3, 63: 4, 62: 1, 61: 1, 60: 3, 59: 3, 56: 2, 55: 1, 54: 1, 53: 1, 52: 1, 51: 2, 50: 2, 49: 2, 48: 4, 47: 1, 45: 7, 44: 3, 43: 1, 42: 2, 41: 5, 40: 2, 39: 3, 38: 3, 37: 4, 36: 4, 35: 4, 34: 8, 33: 5, 32: 7, 31: 4, 30: 7, 29: 13, 28: 7, 27: 4, 26: 12, 25: 14, 24: 9, 23: 11, 22: 13, 21: 14, 20: 11, 19: 16, 18: 14, 17: 17, 16: 22, 15: 33, 14: 28, 13: 34, 12: 37, 11: 35, 10: 47, 9: 56, 8: 69, 7: 74, 6: 101, 5: 137, 4: 207, 3: 327, 2: 692, 1: 2356}
In [12]:
Y=list(hist.values())
X=list(hist.keys())
print(X)
print(Y)
[1343, 1078, 1070, 1010, 984, 945, 750, 517, 455, 397, 347, 323, 309, 293, 280, 279, 250, 240, 229, 218, 201, 200, 189, 171, 170, 159, 143, 142, 131, 127, 118, 117, 113, 112, 101, 95, 94, 93, 92, 91, 89, 87, 85, 83, 80, 79, 78, 77, 76, 74, 72, 71, 70, 68, 67, 65, 63, 62, 61, 60, 59, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 3, 1, 1, 3, 1, 3, 4, 1, 1, 3, 3, 2, 1, 1, 1, 1, 2, 2, 2, 4, 1, 7, 3, 1, 2, 5, 2, 3, 3, 4, 4, 4, 8, 5, 7, 4, 7, 13, 7, 4, 12, 14, 9, 11, 13, 14, 11, 16, 14, 17, 22, 33, 28, 34, 37, 35, 47, 56, 69, 74, 101, 137, 207, 327, 692, 2356]
In [13]:
s=0
for x in X:
    s=s+x*hist.get(x,0)
print("text length is  " +str(s))
    
text length is  36091
In [14]:
plt.loglog(X,Y,marker="o", linestyle='None')
plt.xlabel("Word Frequency")
plt.ylabel("Freq of Freq") 
plt.show()
In [53]:
print(hist[1])
print(len(voc))
print(hist[1]/len(voc))
2356
4583
0.5140737508182414

From above we can see that about half of the words are hapex. In many algorithms, rare words are removed. Zipf's law gives us a guideline as for how many words are removed. If we remove hapex, the voc size is reduced by half.

In [ ]:
 

Project_topics

  1. Can you fit the zipf's law using linear regression? Is there difference between different corpora (different types, different size)?
  2. What is the slope of the freq-freq plot? Can we prove that the slope is one plus the slope of the Zipf's law?
  3. Study the scalability of the code. For instance, what is the faster method for tokenizing documents? What are the bad methods we should watch out? Efficient algorithms can be your project topic for every sub-topic. For example, how to efficiently calculate the co-occurrence matrix later on.

Resources

  1. Power-Law Distributions in Empirical Data https://www.cse.cuhk.edu.hk/~cslui/CMSC5734/Clauset_Shalizi_Newman_09.pdf
In [ ]: