Found 322 repositories(showing 30)
mlfoundations
Editing Models with Task Arithmetic
jwieting
Python code for training all models in the ICLR paper, "Towards Universal Paraphrastic Sentence Embeddings". These models achieve strong performance on semantic similarity tasks without any training or tuning on the training data for those tasks. They also can produce features that are at least as discriminative as skip-thought vectors for semantic similarity tasks at a minimum. Moreover, this code can achieve state-of-the-art results on entailment and sentiment tasks.
mfaruqui
Easy to use scripts for evaluating word vectors on a variety of tasks.
roeehendel
No description available
reddyprasade
In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), it is said to have several attributes or features. Learning problems fall into a few categories: supervised learning, in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either: classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of a classification problem would be handwritten digit recognition, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class. regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight. unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization (Click here to go to the Scikit-Learn unsupervised learning page).
AntoAndGar
Task Singular Vectors: Reducing Task Interference in Model Merging. Merge models avoiding task interference through separable models.
nathanielyvo
The official repository of "Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors""
alhojel
No description available
CallMeJiaGu
用于词向量的相似性任务、类比任务的数据。包含了中文新闻语料分类的实验数据。(Data for similarity tasks and analogy tasks of word vectors. The experimental data of Chinese news corpus classification are included.)
fredzzhang
[NeurIPS'24] Official PyTorch implementation for paper "Knowledge Composition using Task Vectors with Learned Anisotropic Scaling"
aimagelab
Official codebase of "Update Your Transformer to the Latest Release: Re-Basin of Task Vectors" - ICML 2025
averule
Digital image processing helps replace several mundane activities. My project in final year was ‘Application Control using Hand Gesture Recognition From 3-Dimensional Images’. We worked extensively on processing a 3-D image to recognize the encrypted gesture, with the added 3rd dimension for more number of gestures. We started with extensive image segmentation to suppress background, handle dynamic lighting conditions and extract only portion of hand from the 3-D image. A tedious task, we researched many papers on IEEE and books on Image processing, to come-up with a complex code logic involving a combination of three different segmentation techniques involving RGB, YCbCr and HSV models. Further, with mathematical calculations involving use of Eigen values and Eigen vectors were derived based on the co-variance matrix, generated as per captured image, we managed to extract maximum information. A Euclidean distance was calculated to denote the deviation of captured image from each set of pre-captured images for all defined gestures. Hectic code optimization helped save precious execution time. The gesture corresponding to minimal Euclidean distance was the identified gesture. This project acquainted me with MATLAB and I learnt about various image enhancement and image segmentation techniques. I presented a paper at IEEE’s International Conference on Convergence of Technology (I2CT), Pune, published in the journal with ISBN “978-1-4799-3759-2”.
duhaime
Utilities for clustering GloVe vectors for NLP tasks
With the explosion of social networking sites, blogs and review sites a lot of information is available on the web. This information contains emotions and opinions about various product features and the makers of these products. This form of opinion and feedback is important to the companies developing these products as well as the companies that want to develop better rival products. Sentiment Analysis is the task of analyzing all this data, retrieving opinions about these products and services and classifying them as positive or negative, in other words good or bad. The key parts of any review of any product are the numeric rating and the textual description provided along with this product. In our project we will take into consideration both these vectors for product reviews to conclusively decide on a classifier that is best suited to analysis of product reviews. We have gathered reviews and based on the features that best describe the sentiment for each review, we have created a feature set of 1000 features, and with this limited set we will determine which classifier gives the best result on review type data. To determine the best classifier we perform evaluations on it, by running various data set generators, calculating the resubstitution and generalization errors for each classifier. We then use the mean of these results to compute the paired Student’s t-test to relatively compare the performance of the classifiers. Based on the results of this evaluation, we can state which is the best classifier.
This repository contains the Turkish word vectors and analogical reasoning task pairs produced and used during the study.
anujkumarthakur
Introduction Note: This edition of the book is the same as The Rust Programming Language available in print and ebook format from No Starch Press. Welcome to The Rust Programming Language, an introductory book about Rust. The Rust programming language helps you write faster, more reliable software. High-level ergonomics and low-level control are often at odds in programming language design; Rust challenges that conflict. Through balancing powerful technical capacity and a great developer experience, Rust gives you the option to control low-level details (such as memory usage) without all the hassle traditionally associated with such control. Who Rust Is For Rust is ideal for many people for a variety of reasons. Let’s look at a few of the most important groups. Teams of Developers Rust is proving to be a productive tool for collaborating among large teams of developers with varying levels of systems programming knowledge. Low-level code is prone to a variety of subtle bugs, which in most other languages can be caught only through extensive testing and careful code review by experienced developers. In Rust, the compiler plays a gatekeeper role by refusing to compile code with these elusive bugs, including concurrency bugs. By working alongside the compiler, the team can spend their time focusing on the program’s logic rather than chasing down bugs. Rust also brings contemporary developer tools to the systems programming world: Cargo, the included dependency manager and build tool, makes adding, compiling, and managing dependencies painless and consistent across the Rust ecosystem. Rustfmt ensures a consistent coding style across developers. The Rust Language Server powers Integrated Development Environment (IDE) integration for code completion and inline error messages. By using these and other tools in the Rust ecosystem, developers can be productive while writing systems-level code. Students Rust is for students and those who are interested in learning about systems concepts. Using Rust, many people have learned about topics like operating systems development. The community is very welcoming and happy to answer student questions. Through efforts such as this book, the Rust teams want to make systems concepts more accessible to more people, especially those new to programming. Companies Hundreds of companies, large and small, use Rust in production for a variety of tasks. Those tasks include command line tools, web services, DevOps tooling, embedded devices, audio and video analysis and transcoding, cryptocurrencies, bioinformatics, search engines, Internet of Things applications, machine learning, and even major parts of the Firefox web browser. Open Source Developers Rust is for people who want to build the Rust programming language, community, developer tools, and libraries. We’d love to have you contribute to the Rust language. People Who Value Speed and Stability Rust is for people who crave speed and stability in a language. By speed, we mean the speed of the programs that you can create with Rust and the speed at which Rust lets you write them. The Rust compiler’s checks ensure stability through feature additions and refactoring. This is in contrast to the brittle legacy code in languages without these checks, which developers are often afraid to modify. By striving for zero-cost abstractions, higher-level features that compile to lower-level code as fast as code written manually, Rust endeavors to make safe code be fast code as well. The Rust language hopes to support many other users as well; those mentioned here are merely some of the biggest stakeholders. Overall, Rust’s greatest ambition is to eliminate the trade-offs that programmers have accepted for decades by providing safety and productivity, speed and ergonomics. Give Rust a try and see if its choices work for you. Who This Book Is For This book assumes that you’ve written code in another programming language but doesn’t make any assumptions about which one. We’ve tried to make the material broadly accessible to those from a wide variety of programming backgrounds. We don’t spend a lot of time talking about what programming is or how to think about it. If you’re entirely new to programming, you would be better served by reading a book that specifically provides an introduction to programming. How to Use This Book In general, this book assumes that you’re reading it in sequence from front to back. Later chapters build on concepts in earlier chapters, and earlier chapters might not delve into details on a topic; we typically revisit the topic in a later chapter. You’ll find two kinds of chapters in this book: concept chapters and project chapters. In concept chapters, you’ll learn about an aspect of Rust. In project chapters, we’ll build small programs together, applying what you’ve learned so far. Chapters 2, 12, and 20 are project chapters; the rest are concept chapters. Chapter 1 explains how to install Rust, how to write a Hello, world! program, and how to use Cargo, Rust’s package manager and build tool. Chapter 2 is a hands-on introduction to the Rust language. Here we cover concepts at a high level, and later chapters will provide additional detail. If you want to get your hands dirty right away, Chapter 2 is the place for that. At first, you might even want to skip Chapter 3, which covers Rust features similar to those of other programming languages, and head straight to Chapter 4 to learn about Rust’s ownership system. However, if you’re a particularly meticulous learner who prefers to learn every detail before moving on to the next, you might want to skip Chapter 2 and go straight to Chapter 3, returning to Chapter 2 when you’d like to work on a project applying the details you’ve learned. Chapter 5 discusses structs and methods, and Chapter 6 covers enums, match expressions, and the if let control flow construct. You’ll use structs and enums to make custom types in Rust. In Chapter 7, you’ll learn about Rust’s module system and about privacy rules for organizing your code and its public Application Programming Interface (API). Chapter 8 discusses some common collection data structures that the standard library provides, such as vectors, strings, and hash maps. Chapter 9 explores Rust’s error-handling philosophy and techniques. Chapter 10 digs into generics, traits, and lifetimes, which give you the power to define code that applies to multiple types. Chapter 11 is all about testing, which even with Rust’s safety guarantees is necessary to ensure your program’s logic is correct. In Chapter 12, we’ll build our own implementation of a subset of functionality from the grep command line tool that searches for text within files. For this, we’ll use many of the concepts we discussed in the previous chapters. Chapter 13 explores closures and iterators: features of Rust that come from functional programming languages. In Chapter 14, we’ll examine Cargo in more depth and talk about best practices for sharing your libraries with others. Chapter 15 discusses smart pointers that the standard library provides and the traits that enable their functionality. In Chapter 16, we’ll walk through different models of concurrent programming and talk about how Rust helps you to program in multiple threads fearlessly. Chapter 17 looks at how Rust idioms compare to object-oriented programming principles you might be familiar with. Chapter 18 is a reference on patterns and pattern matching, which are powerful ways of expressing ideas throughout Rust programs. Chapter 19 contains a smorgasbord of advanced topics of interest, including unsafe Rust, macros, and more about lifetimes, traits, types, functions, and closures. In Chapter 20, we’ll complete a project in which we’ll implement a low-level multithreaded web server! Finally, some appendixes contain useful information about the language in a more reference-like format. Appendix A covers Rust’s keywords, Appendix B covers Rust’s operators and symbols, Appendix C covers derivable traits provided by the standard library, Appendix D covers some useful development tools, and Appendix E explains Rust editions. There is no wrong way to read this book: if you want to skip ahead, go for it! You might have to jump back to earlier chapters if you experience any confusion. But do whatever works for you. An important part of the process of learning Rust is learning how to read the error messages the compiler displays: these will guide you toward working code. As such, we’ll provide many examples that don’t compile along with the error message the compiler will show you in each situation. Know that if you enter and run a random example, it may not compile! Make sure you read the surrounding text to see whether the example you’re trying to run is meant to error. Ferris will also help you distinguish code that isn’t meant to work:
ajitrajasekharan
Clustering learned BERT vectors for downstream tasks like unsupervised NER, unsupervised sentence embeddings etc.
JingheZ
In this project, there are two major tasks: text data processing and text categorization. In text data processing, we have done tokenization, stemming, normalization, etc. Also, vector space model and statistical language models are used to retrieve similar documents to query. In text categorization, we build a text classification system which includes feature selection, classifiers (Naive Bayes and K Nearest Neighbor using brute force and random vectors), cross validation, and parameter tuning.
kadircet
Tasks in our ctf, with little attack vectors ^^
Upper9527
Official Code for the Paper "Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning"
kadircet
CTF Tasks for HackMETU'16, with some attack vectors
maragraziani
With this library you will be able to apply concept attribution to your task. You will find the functions to compute concept measures on your data, to learn the regression concept vectors and to generate concept based explanations.
Petros89
Real-time video streaming for automated human subtle behavior recognition. The framework uses an open source library to track the facial feature points and the rigid body face transformations on each frame. Unlike activity recognition tasks, where the algorithm can only globally recognize a certain activity (i.e soccer, basketball, running, etc), the proposed behavioral recognition framework buffers local spatio-temporal vectors to detect and track subtle movements to perceive human behavior.
WamesM
Enhancers are small regions of DNA that bind to proteins, which enhance the tran-scription of genes. The enhancer may be located upstream or downstream of the gene. It is not necessarily close to the gene to be acted on, because the entanglement structure of chromatin allows the positions far apart in the sequence to have the op-portunity to contact each other. Therefore, identifying enhancers and their strength is a complex and challenging task. In this article, a new prediction method based on deep learning is proposed to identify enhancers and enhancer strength, called iEnhancer-DCLA. Firstly, we use word2vec to convert k-mers into number vectors to construct an input matrix. Secondly, we use convolutional neural network and bidi-rectional long short-term memory network to extract sequence features, and finally use the attention mechanism to extract relatively important features. In the task of predicting enhancers and their strengths, this method has improved to a certain ex-tent in most evaluation indexes. In summary, we believe that this method provides new ideas in the analysis of enhancers.
Ighina
An implementation of the paper [Vector-Quantization-Based Topic Modeling](https://dl.acm.org/doi/10.1145/3450946), providing a series of VQ-VAE models for topic modelling. The model reaches state-of-the-art performance on Ng20 and enables the extraction of dense topic vectors for downstream tasks.
Sorting is a necessary tool for machine learning, to create algorithms (k-NN) or test-time metrics like top-k classification accuracy or losses based on the rank. Nevertheless it seems to be a difficult task for automatically differentiable pipelines in DL. Sorting gives us two vectors, this application is not differentiable as we are working with integer-valued permutation. In the paper they aim to implement a differentiable proxy of the basic approach. The article conceive this proxy by thinking of an optimal assignment problem. We sort n values by matching them to a probability measure supported on any increasing family of n target values. Therefore we are considering Optimal Transport (OT) as a relaxation of the basic problem allowing us to extend rank and sort operators using probability measures. The auxiliary measure will be supported on m increasing values with m $\ne$ n. Introducing regularization with an entropic penalty and applying Sinkhorn iterations will allow to gain back differentiable operators. The smooth approximation of rank and sort allow to use the 0/1 loss and the quantile regression loss.
yukubo
# Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Multi-threaded word2vec mini-batched skip-gram model. Trains the model described in: (Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space ICLR 2013. http://arxiv.org/abs/1301.3781 This model does traditional minibatching. The key ops used are: * placeholder for feeding in tensors for each example. * embedding_lookup for fetching rows from the embedding matrix. * sigmoid_cross_entropy_with_logits to calculate the loss. * GradientDescentOptimizer for optimizing the loss. * skipgram custom op that does input processing. """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import os import sys import threading import time from six.moves import xrange # pylint: disable=redefined-builtin import numpy as np import tensorflow as tf from tensorflow.models.embedding import gen_word2vec as word2vec flags = tf.app.flags flags.DEFINE_string("save_path", None, "Directory to write the model and " "training summaries.") flags.DEFINE_string("train_data", None, "Training text file. " "E.g., unzipped file http://mattmahoney.net/dc/text8.zip.") flags.DEFINE_string( "eval_data", None, "File consisting of analogies of four tokens." "embedding 2 - embedding 1 + embedding 3 should be close " "to embedding 4." "E.g. https://word2vec.googlecode.com/svn/trunk/questions-words.txt.") flags.DEFINE_integer("embedding_size", 200, "The embedding dimension size.") flags.DEFINE_integer( "epochs_to_train", 15, "Number of epochs to train. Each epoch processes the training data once " "completely.") flags.DEFINE_float("learning_rate", 0.2, "Initial learning rate.") flags.DEFINE_integer("num_neg_samples", 100, "Negative samples per training example.") flags.DEFINE_integer("batch_size", 16, "Number of training examples processed per step " "(size of a minibatch).") flags.DEFINE_integer("concurrent_steps", 12, "The number of concurrent training steps.") flags.DEFINE_integer("window_size", 5, "The number of words to predict to the left and right " "of the target word.") flags.DEFINE_integer("min_count", 5, "The minimum number of word occurrences for it to be " "included in the vocabulary.") flags.DEFINE_float("subsample", 1e-3, "Subsample threshold for word occurrence. Words that appear " "with higher frequency will be randomly down-sampled. Set " "to 0 to disable.") flags.DEFINE_boolean( "interactive", False, "If true, enters an IPython interactive session to play with the trained " "model. E.g., try model.analogy('france', 'paris', 'russia') and " "model.nearby(['proton', 'elephant', 'maxwell']") flags.DEFINE_integer("statistics_interval", 5, "Print statistics every n seconds.") flags.DEFINE_integer("summary_interval", 5, "Save training summary to file every n seconds (rounded " "up to statistics interval.") flags.DEFINE_integer("checkpoint_interval", 600, "Checkpoint the model (i.e. save the parameters) every n " "seconds (rounded up to statistics interval.") FLAGS = flags.FLAGS class Options(object): """Options used by our word2vec model.""" def __init__(self): # Model options. # Embedding dimension. self.emb_dim = FLAGS.embedding_size # Training options. # The training text file. self.train_data = FLAGS.train_data # Number of negative samples per example. self.num_samples = FLAGS.num_neg_samples # The initial learning rate. self.learning_rate = FLAGS.learning_rate # Number of epochs to train. After these many epochs, the learning # rate decays linearly to zero and the training stops. self.epochs_to_train = FLAGS.epochs_to_train # Concurrent training steps. self.concurrent_steps = FLAGS.concurrent_steps # Number of examples for one training step. self.batch_size = FLAGS.batch_size # The number of words to predict to the left and right of the target word. self.window_size = FLAGS.window_size # The minimum number of word occurrences for it to be included in the # vocabulary. self.min_count = FLAGS.min_count # Subsampling threshold for word occurrence. self.subsample = FLAGS.subsample # How often to print statistics. self.statistics_interval = FLAGS.statistics_interval # How often to write to the summary file (rounds up to the nearest # statistics_interval). self.summary_interval = FLAGS.summary_interval # How often to write checkpoints (rounds up to the nearest statistics # interval). self.checkpoint_interval = FLAGS.checkpoint_interval # Where to write out summaries. self.save_path = FLAGS.save_path # Eval options. # The text file for eval. self.eval_data = FLAGS.eval_data class Word2Vec(object): """Word2Vec model (Skipgram).""" def __init__(self, options, session): self._options = options self._session = session self._word2id = {} self._id2word = [] self.build_graph() self.build_eval_graph() self.save_vocab() self._read_analogies() def _read_analogies(self): """Reads through the analogy question file. Returns: questions: a [n, 4] numpy array containing the analogy question's word ids. questions_skipped: questions skipped due to unknown words. """ questions = [] questions_skipped = 0 with open(self._options.eval_data, "rb") as analogy_f: for line in analogy_f: if line.startswith(b":"): # Skip comments. continue words = line.strip().lower().split(b" ") ids = [self._word2id.get(w.strip()) for w in words] if None in ids or len(ids) != 4: questions_skipped += 1 else: questions.append(np.array(ids)) print("Eval analogy file: ", self._options.eval_data) print("Questions: ", len(questions)) print("Skipped: ", questions_skipped) self._analogy_questions = np.array(questions, dtype=np.int32) def forward(self, examples, labels): """Build the graph for the forward pass.""" opts = self._options # Declare all variables we need. # Embedding: [vocab_size, emb_dim] init_width = 0.5 / opts.emb_dim emb = tf.Variable( tf.random_uniform( [opts.vocab_size, opts.emb_dim], -init_width, init_width), name="emb") self._emb = emb # Softmax weight: [vocab_size, emb_dim]. Transposed. sm_w_t = tf.Variable( tf.zeros([opts.vocab_size, opts.emb_dim]), name="sm_w_t") # Softmax bias: [emb_dim]. sm_b = tf.Variable(tf.zeros([opts.vocab_size]), name="sm_b") # Global step: scalar, i.e., shape []. self.global_step = tf.Variable(0, name="global_step") # Nodes to compute the nce loss w/ candidate sampling. labels_matrix = tf.reshape( tf.cast(labels, dtype=tf.int64), [opts.batch_size, 1]) # Negative sampling. sampled_ids, _, _ = (tf.nn.fixed_unigram_candidate_sampler( true_classes=labels_matrix, num_true=1, num_sampled=opts.num_samples, unique=True, range_max=opts.vocab_size, distortion=0.75, unigrams=opts.vocab_counts.tolist())) # Embeddings for examples: [batch_size, emb_dim] example_emb = tf.nn.embedding_lookup(emb, examples) # Weights for labels: [batch_size, emb_dim] true_w = tf.nn.embedding_lookup(sm_w_t, labels) # Biases for labels: [batch_size, 1] true_b = tf.nn.embedding_lookup(sm_b, labels) # Weights for sampled ids: [num_sampled, emb_dim] sampled_w = tf.nn.embedding_lookup(sm_w_t, sampled_ids) # Biases for sampled ids: [num_sampled, 1] sampled_b = tf.nn.embedding_lookup(sm_b, sampled_ids) # True logits: [batch_size, 1] true_logits = tf.reduce_sum(tf.mul(example_emb, true_w), 1) + true_b # Sampled logits: [batch_size, num_sampled] # We replicate sampled noise lables for all examples in the batch # using the matmul. sampled_b_vec = tf.reshape(sampled_b, [opts.num_samples]) sampled_logits = tf.matmul(example_emb, sampled_w, transpose_b=True) + sampled_b_vec return true_logits, sampled_logits def nce_loss(self, true_logits, sampled_logits): """Build the graph for the NCE loss.""" # cross-entropy(logits, labels) opts = self._options true_xent = tf.nn.sigmoid_cross_entropy_with_logits( true_logits, tf.ones_like(true_logits)) sampled_xent = tf.nn.sigmoid_cross_entropy_with_logits( sampled_logits, tf.zeros_like(sampled_logits)) # NCE-loss is the sum of the true and noise (sampled words) # contributions, averaged over the batch. nce_loss_tensor = (tf.reduce_sum(true_xent) + tf.reduce_sum(sampled_xent)) / opts.batch_size return nce_loss_tensor def optimize(self, loss): """Build the graph to optimize the loss function.""" # Optimizer nodes. # Linear learning rate decay. opts = self._options words_to_train = float(opts.words_per_epoch * opts.epochs_to_train) lr = opts.learning_rate * tf.maximum( 0.0001, 1.0 - tf.cast(self._words, tf.float32) / words_to_train) self._lr = lr optimizer = tf.train.GradientDescentOptimizer(lr) train = optimizer.minimize(loss, global_step=self.global_step, gate_gradients=optimizer.GATE_NONE) self._train = train def build_eval_graph(self): """Build the eval graph.""" # Eval graph # Each analogy task is to predict the 4th word (d) given three # words: a, b, c. E.g., a=italy, b=rome, c=france, we should # predict d=paris. # The eval feeds three vectors of word ids for a, b, c, each of # which is of size N, where N is the number of analogies we want to # evaluate in one batch. analogy_a = tf.placeholder(dtype=tf.int32) # [N] analogy_b = tf.placeholder(dtype=tf.int32) # [N] analogy_c = tf.placeholder(dtype=tf.int32) # [N] # Normalized word embeddings of shape [vocab_size, emb_dim]. nemb = tf.nn.l2_normalize(self._emb, 1) # Each row of a_emb, b_emb, c_emb is a word's embedding vector. # They all have the shape [N, emb_dim] a_emb = tf.gather(nemb, analogy_a) # a's embs b_emb = tf.gather(nemb, analogy_b) # b's embs c_emb = tf.gather(nemb, analogy_c) # c's embs # We expect that d's embedding vectors on the unit hyper-sphere is # near: c_emb + (b_emb - a_emb), which has the shape [N, emb_dim]. target = c_emb + (b_emb - a_emb) # Compute cosine distance between each pair of target and vocab. # dist has shape [N, vocab_size]. dist = tf.matmul(target, nemb, transpose_b=True) # For each question (row in dist), find the top 4 words. _, pred_idx = tf.nn.top_k(dist, 4) # Nodes for computing neighbors for a given word according to # their cosine distance. nearby_word = tf.placeholder(dtype=tf.int32) # word id nearby_emb = tf.gather(nemb, nearby_word) nearby_dist = tf.matmul(nearby_emb, nemb, transpose_b=True) nearby_val, nearby_idx = tf.nn.top_k(nearby_dist, min(1000, self._options.vocab_size)) # Nodes in the construct graph which are used by training and # evaluation to run/feed/fetch. self._analogy_a = analogy_a self._analogy_b = analogy_b self._analogy_c = analogy_c self._analogy_pred_idx = pred_idx self._nearby_word = nearby_word self._nearby_val = nearby_val self._nearby_idx = nearby_idx def build_graph(self): """Build the graph for the full model.""" opts = self._options # The training data. A text file. (words, counts, words_per_epoch, self._epoch, self._words, examples, labels) = word2vec.skipgram(filename=opts.train_data, batch_size=opts.batch_size, window_size=opts.window_size, min_count=opts.min_count, subsample=opts.subsample) (opts.vocab_words, opts.vocab_counts, opts.words_per_epoch) = self._session.run([words, counts, words_per_epoch]) opts.vocab_size = len(opts.vocab_words) print("Data file: ", opts.train_data) print("Vocab size: ", opts.vocab_size - 1, " + UNK") print("Words per epoch: ", opts.words_per_epoch) self._examples = examples self._labels = labels self._id2word = opts.vocab_words for i, w in enumerate(self._id2word): self._word2id[w] = i true_logits, sampled_logits = self.forward(examples, labels) loss = self.nce_loss(true_logits, sampled_logits) tf.scalar_summary("NCE loss", loss) self._loss = loss self.optimize(loss) # Properly initialize all variables. tf.initialize_all_variables().run() self.saver = tf.train.Saver() def save_vocab(self): """Save the vocabulary to a file so the model can be reloaded.""" opts = self._options with open(os.path.join(opts.save_path, "vocab.txt"), "w") as f: for i in xrange(opts.vocab_size): f.write("%s %d\n" % (tf.compat.as_text(opts.vocab_words[i]), opts.vocab_counts[i])) def _train_thread_body(self): initial_epoch, = self._session.run([self._epoch]) while True: _, epoch = self._session.run([self._train, self._epoch]) if epoch != initial_epoch: break def train(self): """Train the model.""" opts = self._options initial_epoch, initial_words = self._session.run([self._epoch, self._words]) summary_op = tf.merge_all_summaries() summary_writer = tf.train.SummaryWriter(opts.save_path, graph_def=self._session.graph_def) workers = [] for _ in xrange(opts.concurrent_steps): t = threading.Thread(target=self._train_thread_body) t.start() workers.append(t) last_words, last_time, last_summary_time = initial_words, time.time(), 0 last_checkpoint_time = 0 while True: time.sleep(opts.statistics_interval) # Reports our progress once a while. (epoch, step, loss, words, lr) = self._session.run( [self._epoch, self.global_step, self._loss, self._words, self._lr]) now = time.time() last_words, last_time, rate = words, now, (words - last_words) / ( now - last_time) print("Epoch %4d Step %8d: lr = %5.3f loss = %6.2f words/sec = %8.0f\r" % (epoch, step, lr, loss, rate), end="") sys.stdout.flush() if now - last_summary_time > opts.summary_interval: summary_str = self._session.run(summary_op) summary_writer.add_summary(summary_str, step) last_summary_time = now if now - last_checkpoint_time > opts.checkpoint_interval: self.saver.save(self._session, opts.save_path + "model", global_step=step.astype(int)) last_checkpoint_time = now if epoch != initial_epoch: break for t in workers: t.join() return epoch def _predict(self, analogy): """Predict the top 4 answers for analogy questions.""" idx, = self._session.run([self._analogy_pred_idx], { self._analogy_a: analogy[:, 0], self._analogy_b: analogy[:, 1], self._analogy_c: analogy[:, 2] }) return idx def eval(self): """Evaluate analogy questions and reports accuracy.""" # How many questions we get right at precision@1. correct = 0 total = self._analogy_questions.shape[0] start = 0 while start < total: limit = start + 2500 sub = self._analogy_questions[start:limit, :] idx = self._predict(sub) start = limit for question in xrange(sub.shape[0]): for j in xrange(4): if idx[question, j] == sub[question, 3]: # Bingo! We predicted correctly. E.g., [italy, rome, france, paris]. correct += 1 break elif idx[question, j] in sub[question, :3]: # We need to skip words already in the question. continue else: # The correct label is not the precision@1 break print() print("Eval %4d/%d accuracy = %4.1f%%" % (correct, total, correct * 100.0 / total)) def analogy(self, w0, w1, w2): """Predict word w3 as in w0:w1 vs w2:w3.""" wid = np.array([[self._word2id.get(w, 0) for w in [w0, w1, w2]]]) idx = self._predict(wid) for c in [self._id2word[i] for i in idx[0, :]]: if c not in [w0, w1, w2]: return c return "unknown" def nearby(self, words, num=20): """Prints out nearby words given a list of words.""" ids = np.array([self._word2id.get(x, 0) for x in words]) vals, idx = self._session.run( [self._nearby_val, self._nearby_idx], {self._nearby_word: ids}) for i in xrange(len(words)): print("\n%s\n=====================================" % (words[i])) for (neighbor, distance) in zip(idx[i, :num], vals[i, :num]): print("%-20s %6.4f" % (self._id2word[neighbor], distance)) def _start_shell(local_ns=None): # An interactive shell is useful for debugging/development. import IPython user_ns = {} if local_ns: user_ns.update(local_ns) user_ns.update(globals()) IPython.start_ipython(argv=[], user_ns=user_ns) def main(_): """Train a word2vec model.""" if not FLAGS.train_data or not FLAGS.eval_data or not FLAGS.save_path: print("--train_data --eval_data and --save_path must be specified.") sys.exit(1) opts = Options() with tf.Graph().as_default(), tf.Session() as session: with tf.device("/cpu:0"): model = Word2Vec(opts, session) for _ in xrange(opts.epochs_to_train): model.train() # Process one epoch model.eval() # Eval analogies. # Perform a final save. model.saver.save(session, os.path.join(opts.save_path, "model.ckpt"), global_step=model.global_step) if FLAGS.interactive: # E.g., # [0]: model.analogy('france', 'paris', 'russia') # [1]: model.nearby(['proton', 'elephant', 'maxwell']) _start_shell(locals()) if __name__ == "__main__": tf.app.run()
tesatory
Word vectors for sentence completion task
ABgit111
Official Repository for Paper: Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging
katoro8989
Source code for paper "DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging" (ICLR2026)