Found 5,998 repositories(showing 30)
mourarthur
A collection of awesome papers, articles and various resources on credit and credit risk modeling
finlytics-hub
A comprehensive credit risk model and scorecard using data from Lending Club
djbolder
Credit-Risk Modelling Libraries
ing-bank
scikit-learn compatible tools for building credit risk acceptance models
vishnukanduri
Modeled the credit risk associated with consumer loans. Performed exploratory data analysis (EDA), preprocessing of continuous and discrete variables using various techniques depending on the feature. Checked for missing values and cleaned the data. Built the probability of default model using Logistic Regression. Visualized all the results. Computed Weight of Evidence and price elasticities.
Aryia-Behroziuan
An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[68] Decision trees Main article: Decision tree learning Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making. Support vector machines Main article: Support vector machines Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.[69] An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Illustration of linear regression on a data set. Regression analysis Main article: Regression analysis Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel[70]), logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher-dimensional space. Bayesian networks Main article: Bayesian network A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet. A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Genetic algorithms Main article: Genetic algorithm A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.[71][72] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[73] Training models Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model. Federated learning Main article: Federated learning Federated learning is an adapted form of distributed artificial intelligence to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.[74] Applications There are many applications for machine learning, including: Agriculture Anatomy Adaptive websites Affective computing Banking Bioinformatics Brain–machine interfaces Cheminformatics Citizen science Computer networks Computer vision Credit-card fraud detection Data quality DNA sequence classification Economics Financial market analysis[75] General game playing Handwriting recognition Information retrieval Insurance Internet fraud detection Linguistics Machine learning control Machine perception Machine translation Marketing Medical diagnosis Natural language processing Natural language understanding Online advertising Optimization Recommender systems Robot locomotion Search engines Sentiment analysis Sequence mining Software engineering Speech recognition Structural health monitoring Syntactic pattern recognition Telecommunication Theorem proving Time series forecasting User behavior analytics In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy of its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.[76] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.[77] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis.[78] In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[79] In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings and that it may have revealed previously unrecognized influences among artists.[80] In 2019 Springer Nature published the first research book created using machine learning.[81] Limitations Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results.[82][83][84] Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[85] In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a collision.[86] Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of dollars invested.[87][88] Bias Main article: Algorithmic bias Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.[89] Language models learned from data have been shown to contain human-like biases.[90][91] Machine learning systems used for criminal risk assessment have been found to be biased against black people.[92][93] In 2015, Google photos would often tag black people as gorillas,[94] and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all.[95] Similar issues with recognizing non-white people have been found in many other systems.[96] In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language.[97] Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.[98] Concern for fairness in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including Fei-Fei Li, who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”[99] Model assessments Classification of machine learning models can be validated by accuracy estimation techniques like the holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[100] In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the false positive rate (FPR) as well as the false negative rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The total operating characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used receiver operating characteristic (ROC) and ROC's associated area under the curve (AUC).[101] Ethics Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[102] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[103][104] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning. Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.[105][106] Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.[107] Hardware Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training deep neural networks (a particular narrow subdomain of machine learning) that contain many layers of non-linear hidden units.[108] By 2019, graphic processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.[109] OpenAI estimated the hardware compute used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.[110][111] Software Software suites containing a variety of machine learning algorithms include the following: Free and open-source so
levist7
Credit Risk Modelling | Calculation of PD, LGD, EDA and EL with Machine Learning in Python
ayhandis
A Credit Risk Scoring Modeling and Validation Package
xRiskLab
Explainable Boosted Scoring with Python: turning XGBoost, LightGBM, and CatBoost into explainable scorecards
DavieObi
This project focuses on predicting customer churn for a bank using machine learning. By analyzing features like credit score, age, balance, and geography, the model identifies at-risk customers, uncovers key churn drivers, and provides insights to help the bank implement targeted retention strategies.
nishant1005
A study and comparison of Risk Modeling algorithms (Capstone Project)
🍔 Credit 🍏 Card 🍎 Fraud 🍑 Detection 🚂 With Machine ✈ Learning 🚁Algorithms is 🚀 a data science 🚟 focused on 🛫 building 🚒 predictive 🚞 models to 🚈 detect 🛸credit 🚛 transactions ⛵ Using 🧸 supervised ⚽ learning ⚾ algorithms 🥎 it analyzes 🏀 transaction 🏐 patterns 🏈 and identifies 🧵 anomalies 🥌 to reduce 🕹 financial 🎮 fraud risks
JakobLS
MLOps for deploying a Credit Risk model
abhashpanwar
credit-risk-modeling
shawn-y-sun
Credit Risk Modeling to Compute Expected Loss of Loans (logistic regression, linear regression)
Amatofrancesco99
The aim is to understand which are the key factors for a certain level of credit risk to occur. In addition, some ML models capable to predict the credit risk level for a company in an year - given past years data - have been built and compared.
aniruddhachoudhury
Credit Risk Model on Machine learning and prediction
nafiul-araf
This credit risk modeling project for a modest-scale finance company encompasses the entire process, from data collection and exploratory data analysis (EDA) to deployment.
HongzoengNg
This Model is created to describe the credit risk of a listed company
open-risk
openLGD is a Python powered library for the statistical estimation of Credit Risk Loss Given Default models. It can be used both as standalone library and in a federated learning context where data remain in distinct (separate) servers
yaronshap
Implementation of algorithms from the paper "Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application to Credit-Risk Evaluation" by Cynthia Rudin and Yaron Shaposhnik https://ssrn.com/abstract=3395422
brunokatekawa
[Project repo] Improving business with a credit risk model
allmeidaapedro
This is an end to end machine learning project using Random Forest to predict credit risk of German Bank's customers. By employing predictive models, the bank can make informed decisions that balance profit generation with prudent risk management, ultimately benefiting both the institution and its customers.
Replication codes for Deep Learning Credit Risk Modeling by Manzo, Qiao
ajtulloch
Code used to implement various stochastic intensity models for univariate and multivariate credit risk models.
yqmark
Privacy Policy introduction We understand the importance of personal information to you and will do our utmost to protect your personal information. We are committed to maintaining your trust in us and to abide by the following principles to protect your personal information: the principle of consistency of rights and responsibilities, the principle of purpose , choose the principle of consent, at least the principle of sufficient use, ensure the principle of security, the principle of subject participation, the principle of openness and transparency, and so on. At the same time, we promise that we will take appropriate security measures to protect your personal information according to the industry's mature security solutions. In view of this, we have formulated this "Private Privacy Policy" (hereinafter referred to as "this policy" /This Privacy Policy") and remind you: This policy applies to products or services on this platform. If the products or services provided by the platform are used in the products or services of our affiliates (for example, using the platform account directly) but there is no independent privacy policy, this policy also applies to the products or services. It is important to note that this policy does not apply to other third-party services provided by you, nor to products or services on this platform that have been independently set up with a privacy policy. Before using the products or services on this platform, please read and understand this policy carefully, and use the related products or services after confirming that you fully understand and agree. By using the products or services on this platform, you understand and agree to this policy. If you have any questions, comments or suggestions about the content of this policy, you can contact us through various contact methods provided by this platform. This privacy policy section will help you understand the following: How we collect and use your personal information How do we use cookies and similar technologies? How do we share, transfer, and publicly disclose your personal information? How we protect your personal information How do you manage your personal information? How do we deal with the personal information of minors? How your personal information is transferred globally How to update this privacy policy How to contact us 一、How we collect and use your personal information Personal information refers to various information recorded electronically or otherwise that can identify a specific natural person or reflect the activities of a particular natural person, either alone or in combination with other information. We collect and use your information for the purposes described in this policy. Personal information: (一)Help you become our user To create an account so that we can serve you, you will need to provide the following information: your nickname, avatar, gender, date of birth, mobile number/signal/QQ number, and create a username and password. During the registration process, if you provide the following additional information to supplement your personal information, it will help us to provide you with better service and experience: your real name, real ID information, hometown, emotional status, constellation, occupation, school Your real avatar. However, if you do not provide this information, it will not affect the basic functions of using the platform products or services. The above information provided by you will continue to authorize us during your use of the Service. When you voluntarily cancel your account, we will make it anonymous or delete your personal information as soon as possible in accordance with applicable laws and regulations. (二)Show and push goods or services for you In order to improve our products or services and provide you with personalized information search and transaction services, we will extract your browsing, search preferences, behavioral habits based on your browsing and search history, device information, location information, and transaction information. Features such as location information, indirect crowd portraits based on feature tags, and display and push information. If you do not want to accept commercials that we send to you, you can cancel them at any time through the product unsubscribe feature. (三)Provide goods or services to you 1、Information you provide to us Relevant personal information that you provide to us when registering for an account or using our services, such as phone numbers, emails, bank card numbers or Alipay accounts; The shared information that you provide to other parties through our services and the information that you store when you use our services. Before providing the platform with the aforementioned personal information of the other party, you need to ensure that you have obtained your authorization. 2、Information we collect during your use of the service In order to provide you with page display and search results that better suit your needs, understand product suitability, and identify account anomalies, we collect and correlate information about the services you use and how they are used, including: Device Information: We will receive and record information about the device you are using (such as device model, operating system version, device settings, unique device identifier, etc.) based on the specific permissions you have granted during software installation and use. Information about the location of the device (such as Idiv address, GdivS location, and Wi-Fi that can provide relevant information) Sensor information such as access points, Bluetooth and base stations. Since the services we provide are based on the mobile social services provided by the geographic location, you confirm that the successful registration of the "this platform" account is deemed to confirm the authorization to extract, disclose and use your geographic location information. . If you need to terminate your location information to other users, you can set it to be invisible at any time. Log information: When you use our website or the products or services provided by the client, we will automatically collect your detailed usage of our services as a related web log. For example, your search query content, Idiv address, browser type, telecom carrier, language used, date and time of access, and web page history you visit. Please note that separate device information, log information, etc. are information that does not identify a particular natural person. If we combine such non-personal information with other information to identify a particular natural person or use it in conjunction with personal information, such non-personal information will be treated as personal information during the combined use, except for your authorization. Or as otherwise provided by laws and regulations, we will anonymize and de-identify such personal information. When you contact us, we may save information such as your communication/call history and content or the contact information you left in order to contact you or help you solve the problem or to document the resolution and results of the problem. 3、Your personal information collected through indirect access You can use the products or services provided by our affiliates through the link of the platform provided by our platform account. In order to facilitate our one-stop service based on the linked accounts and facilitate your unified management, we will show you on this platform. Information or recommendations for information you are interested in, including information from live broadcasts and games. You can discover and use the above services through the homepage of the platform, "More" and other functions. When you use the above services through our products or services, you authorize us to receive, aggregate, and analyze from our affiliates based on actual business and cooperation needs, we confirm that their source is legal or that you authorize to consent to your personal information provided to us or Trading Information. If you refuse to provide the above information or refuse to authorize, you may not be able to use the corresponding products or services of our affiliates, or can not display relevant information, but does not affect the use of the platform to browse, chat, release dynamics and other core services. (四)Provide you with security Please note that in order to ensure the authenticity of the user's identity and provide you with better security, you can provide us with identification information such as identity card, military officer's card, passport, driver's license, social security card, residence permit, facial identification, and other biometric information. Personally sensitive information such as Sesame Credit and other real-name certifications. If you refuse to provide the above information, you may not be able to use services such as account management, live broadcast, and continuing risky transactions, but it will not affect your use of browsing, chat and other services. To improve the security of your services provided by us and our affiliates and partners, protect the personal and property of you or other users or the public from being compromised, and better prevent phishing websites, fraud, network vulnerabilities, computer viruses, cyber attacks , security risks such as network intrusion, more accurately identify violations of laws and regulations or the relevant rules of the platform, we may use or integrate your user information, transaction information, equipment information, related web logs and our affiliates, partners to obtain You authorize or rely on the information shared by law to comprehensively judge your account and transaction risks, conduct identity verification, detect and prevent security incidents, and take necessary records, audits, analysis, and disposal measures in accordance with the law. (五)Other uses When we use the information for other purposes not covered by this policy, or if the information collected for a specific purpose is used for other purposes, you will be asked for your prior consent. (六)Exception for authorization of consent According to relevant laws and regulations, collecting your personal information in the following situations does not require your authorized consent: 1、Related to national security and national defense security; 2、Related to public safety, public health, and major public interests; 3、Related to criminal investigation, prosecution, trial and execution of judgments, etc.; 4、It is difficult to obtain your own consent for the maintenance of the important legal rights of the personal information or other individuals’ lives and property; 5、The personal information collected is disclosed to the public by yourself; 二、How do we use cookies and similar technologies? (一)Cookies To ensure that your site is up and running, to give you an easier access experience, and to recommend content that may be of interest to you, we store a small data file called a cookie on your computer or mobile device. Cookies usually contain an identifier, a site name, and some numbers and characters. With cookies, websites can store data such as your preferences. (二)Website Beacons and Pixel Labels In addition to cookies, we use other technologies like web beacons and pixel tags on our website. For example, the email we send to you may contain an address link to the content of our website. If you click on the link, we will track the click to help us understand your product or service preferences so that we can proactively improve customer service. Experience. A web beacon is usually a transparent image that is embedded in a website or email. With the pixel tags in the email, we can tell if the email is open. If you don't want your event to be tracked this way, you can unsubscribe from our mailing list at any time. 三、How do we share, transfer, and publicly disclose your personal information? (一)shared We do not share your personal information with companies, organizations, and individuals other than the platform's service providers, with the following exceptions: 1、Sharing with explicit consent: We will share your personal information with others after obtaining your explicit consent. 2、Sharing under statutory circumstances: We may share your personal information in accordance with laws and regulations, litigation dispute resolution needs, or in accordance with the requirements of the administrative and judicial authorities. 3. Sharing with affiliates: In order to facilitate our services to you based on linked accounts, we recommend information that may be of interest to you or protect the personal property of affiliates or other users or the public of this platform from being infringed. Personal information may be shared with our affiliates. We will only share the necessary personal information (for example, to facilitate the use of our affiliated company products or services, we will share your necessary account information with affiliates) if we share your personal sensitive information or affiliate changes The use of personal information and the purpose of processing will be re-examined for your authorization. 4. Sharing with Authorized Partners: For the purposes stated in this Privacy Policy, some of our services will be provided by us and our authorized partners. We may share some of your personal information with our partners to provide better customer service and user experience. For example, arrange a partner to provide services. We will only share your personal information for legitimate, legitimate, necessary, specific, and specific purposes, and will only share the personal information necessary to provide the service. Our partners are not authorized to use shared personal information for other purposes unrelated to the product or service. Currently, our authorized partners include the following types: (2) Suppliers, service providers and other partners. We send information to suppliers, service providers and other partners who support our business, including providing technical infrastructure services, analyzing how our services are used, measuring the effectiveness of advertising and services, providing customer service, and facilitating payments. Or conduct academic research and investigations. (1) Authorized partners in advertising and analytics services. We will not use your personally identifiable information (information that identifies you, such as your name or email address, which can be used to contact you or identify you) and provide advertising and analytics services, unless you have your permission. Shared by partners. We will provide these partners with information about their advertising coverage and effectiveness, without providing your personally identifiable information, or we may aggregate this information so that it does not identify you personally. For example, we’ll only tell advertisers how effective their ads are when they agree to comply with our advertising guidelines, or how many people see their ads or install apps after seeing ads, or work with them. Partners provide statistical information that does not identify individuals (eg “male, 25-29 years old, in Beijing”) to help them understand their audience or customers. For companies, organizations and individuals with whom we share personal information, we will enter into strict data protection agreements with them to process individuals in accordance with our instructions, this Privacy Policy and any other relevant confidentiality and security measures. information. (2) Transfer We do not transfer your personal information to any company, organization or individual, except: Transfer with the express consent: After obtaining your explicit consent, we will transfer your personal information to other parties; 2, in the case of mergers, acquisitions or bankruptcy liquidation, or other circumstances involving mergers, acquisitions or bankruptcy liquidation, if it involves the transfer of personal information, we will require new companies and organizations that hold your personal information to continue to receive This policy is bound, otherwise we will ask the company, organization and individual to re-seek your consent. (3) Public disclosure We will only publicly disclose your personal information in the following circumstances: We may publicly disclose your personal information by obtaining your explicit consent or based on your active choice; 2, if we determine that you have violated laws and regulations or serious violations of the relevant rules of the platform, or to protect the personal safety of the platform and its affiliates users or the public from infringement, we may be based on laws and regulations or The relevant agreement rules of this platform disclose your personal information, including related violations, and the measures that the platform has taken against you, with your consent. (4) Exceptions for prior authorization of consent when sharing, transferring, and publicly disclosing personal information In the following situations, sharing, transferring, and publicly disclosing your personal information does not require prior authorization from you: Related to national security and national defense security; Related to public safety, public health, and major public interests; 3, related to criminal investigation, prosecution, trial and judgment execution; 4, in order to protect your or other individuals' life, property and other important legal rights but it is difficult to get my consent; Personal information that you disclose to the public on your own; Collect personal information from legally publicly disclosed information, such as legal news reports and government information disclosure. According to the law, sharing, transferring and de-identifying personal information, and ensuring that the data recipient cannot recover and re-identify the personal information subject, does not belong to the external sharing, transfer and public disclosure of personal information. The preservation and processing of the class data will not require additional notice and your consent. How do we protect your personal information? (1) We have taken reasonable and feasible security measures in accordance with the industry's general solutions to protect the security of personal information provided by you, and to prevent unauthorized access, public disclosure, use, modification, damage or loss of personal information. For example, SSL (Secure Socket) when exchanging data (such as credit card information) between your browser and the server Layer) protocol encryption protection; we use encryption technology to improve the security of personal information; we use a trusted protection mechanism to prevent personal information from being maliciously attacked; we will deploy access control mechanisms to ensure that only authorized personnel can access individuals Information; and we will conduct security and privacy protection training courses to enhance employees' awareness of the importance of protecting personal information. (2) We have advanced data security management system around the data life cycle, which enhances the security of the whole system from organizational construction, system design, personnel management, product technology and other aspects. (3) We will take reasonable and feasible measures and try our best to avoid collecting irrelevant personal information. We will only retain your personal information for the period of time required to achieve the purposes stated in this policy, unless the retention period is extended or permitted by law. (4) The Internet is not an absolutely secure environment. We strongly recommend that you do not use personal communication methods that are not recommended by this platform. You can connect and share with each other through our services. When you create communications, transactions, or sharing through our services, you can choose who you want to communicate, trade, or share as a third party who can see your trading content, contact information, exchange information, or share content. If you find that your personal information, especially your account or password, has been leaked, please contact our customer service immediately so that we can take appropriate measures according to your application. Please note that the information you voluntarily share or even share publicly when using our services may involve personal information of you or others or even sensitive personal information, such as when you post a news or choose to upload in public in group chats, circles, etc. A picture containing personal information. Please consider more carefully whether you share or even share information publicly when using our services. Please use complex passwords to help us keep your account secure. We will do our best to protect the security of any information you send us. At the same time, we will report the handling of personal information security incidents in accordance with the requirements of the regulatory authorities. V. How your personal information is transferred globally Personal information collected and generated by us during our operations in the People's Republic of China is stored in China, with the following exceptions: Laws and regulations have clear provisions; 2, get your explicit authorization; 3, you through the Internet for cross-border live broadcast / release dynamics and other personal initiatives. In response to the above, we will ensure that your personal information is adequately protected in accordance with this Privacy Policy.
maixbach
Credit Risk Analysis using Machine Learning models
wangkuiyi
This is an tutorial on credit risk model designed for peer-to-peer lending (Internet finance)
Credit Score Provider for the Faker Python Project. Use this to generate fake but realistic-looking consumer credit scores aligning to the most prevalent risk models (FICO, VantageScore, etc.)
Bitcoin: A Peer-to-Peer Electronic Cash System Satoshi Nakamoto satoshin@gmx.com www.bitcoin.org Abstract. A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone. 1. Introduction Commerce on the Internet has come to rely almost exclusively on financial institutions serving as trusted third parties to process electronic payments. While the system works well enough for most transactions, it still suffers from the inherent weaknesses of the trust based model. Completely non-reversible transactions are not really possible, since financial institutions cannot avoid mediating disputes. The cost of mediation increases transaction costs, limiting the minimum practical transaction size and cutting off the possibility for small casual transactions, and there is a broader cost in the loss of ability to make non-reversible payments for non- reversible services. With the possibility of reversal, the need for trust spreads. Merchants must be wary of their customers, hassling them for more information than they would otherwise need. A certain percentage of fraud is accepted as unavoidable. These costs and payment uncertainties can be avoided in person by using physical currency, but no mechanism exists to make payments over a communications channel without a trusted party. What is needed is an electronic payment system based on cryptographic proof instead of trust, allowing any two willing parties to transact directly with each other without the need for a trusted third party. Transactions that are computationally impractical to reverse would protect sellers from fraud, and routine escrow mechanisms could easily be implemented to protect buyers. In this paper, we propose a solution to the double-spending problem using a peer-to-peer distributed timestamp server to generate computational proof of the chronological order of transactions. The system is secure as long as honest nodes collectively control more CPU power than any cooperating group of attacker nodes. 1 2. Transactions We define an electronic coin as a chain of digital signatures. Each owner transfers the coin to the next by digitally signing a hash of the previous transaction and the public key of the next owner and adding these to the end of the coin. A payee can verify the signatures to verify the chain of ownership. Transaction Hash Transaction Hash Transaction Hash Owner 1's Public Key Owner 2's Public Key Owner 3's Public Key Owner 0's Signature Owner 1's Signature The problem of course is the payee can't verify that one of the owners did not double-spend the coin. A common solution is to introduce a trusted central authority, or mint, that checks every transaction for double spending. After each transaction, the coin must be returned to the mint to issue a new coin, and only coins issued directly from the mint are trusted not to be double-spent. The problem with this solution is that the fate of the entire money system depends on the company running the mint, with every transaction having to go through them, just like a bank. We need a way for the payee to know that the previous owners did not sign any earlier transactions. For our purposes, the earliest transaction is the one that counts, so we don't care about later attempts to double-spend. The only way to confirm the absence of a transaction is to be aware of all transactions. In the mint based model, the mint was aware of all transactions and decided which arrived first. To accomplish this without a trusted party, transactions must be publicly announced [1], and we need a system for participants to agree on a single history of the order in which they were received. The payee needs proof that at the time of each transaction, the majority of nodes agreed it was the first received. 3. Timestamp Server The solution we propose begins with a timestamp server. A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash, such as in a newspaper or Usenet post [2-5]. The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash. Each timestamp includes the previous timestamp in its hash, forming a chain, with each additional timestamp reinforcing the ones before it. Hash Hash Owner 2's Signature Owner 1's Private Key Owner 2's Private Key Owner 3's Private Key Block Item Item ... 2 Block Item Item ... Verify Verify Sign Sign 4. Proof-of-Work To implement a distributed timestamp server on a peer-to-peer basis, we will need to use a proof- of-work system similar to Adam Back's Hashcash [6], rather than newspaper or Usenet posts. The proof-of-work involves scanning for a value that when hashed, such as with SHA-256, the hash begins with a number of zero bits. The average work required is exponential in the number of zero bits required and can be verified by executing a single hash. For our timestamp network, we implement the proof-of-work by incrementing a nonce in the block until a value is found that gives the block's hash the required zero bits. Once the CPU effort has been expended to make it satisfy the proof-of-work, the block cannot be changed without redoing the work. As later blocks are chained after it, the work to change the block would include redoing all the blocks after it. The proof-of-work also solves the problem of determining representation in majority decision making. If the majority were based on one-IP-address-one-vote, it could be subverted by anyone able to allocate many IPs. Proof-of-work is essentially one-CPU-one-vote. The majority decision is represented by the longest chain, which has the greatest proof-of-work effort invested in it. If a majority of CPU power is controlled by honest nodes, the honest chain will grow the fastest and outpace any competing chains. To modify a past block, an attacker would have to redo the proof-of-work of the block and all blocks after it and then catch up with and surpass the work of the honest nodes. We will show later that the probability of a slower attacker catching up diminishes exponentially as subsequent blocks are added. To compensate for increasing hardware speed and varying interest in running nodes over time, the proof-of-work difficulty is determined by a moving average targeting an average number of blocks per hour. If they're generated too fast, the difficulty increases. 5. Network The steps to run the network are as follows: 1) New transactions are broadcast to all nodes. 2) Each node collects new transactions into a block. 3) Each node works on finding a difficult proof-of-work for its block. 4) When a node finds a proof-of-work, it broadcasts the block to all nodes. 5) Nodes accept the block only if all transactions in it are valid and not already spent. 6) Nodes express their acceptance of the block by working on creating the next block in the chain, using the hash of the accepted block as the previous hash. Nodes always consider the longest chain to be the correct one and will keep working on extending it. If two nodes broadcast different versions of the next block simultaneously, some nodes may receive one or the other first. In that case, they work on the first one they received, but save the other branch in case it becomes longer. The tie will be broken when the next proof- of-work is found and one branch becomes longer; the nodes that were working on the other branch will then switch to the longer one. 3 Block Nonce Tx Tx ... Block Nonce Tx Tx ... Prev Hash Prev Hash New transaction broadcasts do not necessarily need to reach all nodes. As long as they reach many nodes, they will get into a block before long. Block broadcasts are also tolerant of dropped messages. If a node does not receive a block, it will request it when it receives the next block and realizes it missed one. 6. Incentive By convention, the first transaction in a block is a special transaction that starts a new coin owned by the creator of the block. This adds an incentive for nodes to support the network, and provides a way to initially distribute coins into circulation, since there is no central authority to issue them. The steady addition of a constant of amount of new coins is analogous to gold miners expending resources to add gold to circulation. In our case, it is CPU time and electricity that is expended. The incentive can also be funded with transaction fees. If the output value of a transaction is less than its input value, the difference is a transaction fee that is added to the incentive value of the block containing the transaction. Once a predetermined number of coins have entered circulation, the incentive can transition entirely to transaction fees and be completely inflation free. The incentive may help encourage nodes to stay honest. If a greedy attacker is able to assemble more CPU power than all the honest nodes, he would have to choose between using it to defraud people by stealing back his payments, or using it to generate new coins. He ought to find it more profitable to play by the rules, such rules that favour him with more new coins than everyone else combined, than to undermine the system and the validity of his own wealth. 7. Reclaiming Disk Space Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space. To facilitate this without breaking the block's hash, transactions are hashed in a Merkle Tree [7][2][5], with only the root included in the block's hash. Old blocks can then be compacted by stubbing off branches of the tree. The interior hashes do not need to be stored. Block Hash0 Hash1 Hash2 Hash3 Tx0 Tx1 Tx2 Tx3 Block Header (Block Hash) Prev Hash Nonce Root Hash Hash01 Hash23 Block Block Header (Block Hash) Prev Hash Nonce Root Hash Hash01 Hash23 Hash2 Hash3 Tx3 Transactions Hashed in a Merkle Tree After Pruning Tx0-2 from the Block A block header with no transactions would be about 80 bytes. If we suppose blocks are generated every 10 minutes, 80 bytes * 6 * 24 * 365 = 4.2MB per year. With computer systems typically selling with 2GB of RAM as of 2008, and Moore's Law predicting current growth of 1.2GB per year, storage should not be a problem even if the block headers must be kept in memory. 4 8. Simplified Payment Verification It is possible to verify payments without running a full network node. A user only needs to keep a copy of the block headers of the longest proof-of-work chain, which he can get by querying network nodes until he's convinced he has the longest chain, and obtain the Merkle branch linking the transaction to the block it's timestamped in. He can't check the transaction for himself, but by linking it to a place in the chain, he can see that a network node has accepted it, and blocks added after it further confirm the network has accepted it. Longest Proof-of-Work Chain Block Header Block Header Block Header Prev Hash Nonce Prev Hash Nonce Prev Hash Nonce Merkle Root Merkle Root Merkle Root Hash01 Hash23 Merkle Branch for Tx3 Hash2 Hash3 Tx3 As such, the verification is reliable as long as honest nodes control the network, but is more vulnerable if the network is overpowered by an attacker. While network nodes can verify transactions for themselves, the simplified method can be fooled by an attacker's fabricated transactions for as long as the attacker can continue to overpower the network. One strategy to protect against this would be to accept alerts from network nodes when they detect an invalid block, prompting the user's software to download the full block and alerted transactions to confirm the inconsistency. Businesses that receive frequent payments will probably still want to run their own nodes for more independent security and quicker verification. 9. Combining and Splitting Value Although it would be possible to handle coins individually, it would be unwieldy to make a separate transaction for every cent in a transfer. To allow value to be split and combined, transactions contain multiple inputs and outputs. Normally there will be either a single input from a larger previous transaction or multiple inputs combining smaller amounts, and at most two outputs: one for the payment, and one returning the change, if any, back to the sender. It should be noted that fan-out, where a transaction depends on several transactions, and those transactions depend on many more, is not a problem here. There is never the need to extract a complete standalone copy of a transaction's history. 5 Transaction In Out In ... ... 10. Privacy The traditional banking model achieves a level of privacy by limiting access to information to the parties involved and the trusted third party. The necessity to announce all transactions publicly precludes this method, but privacy can still be maintained by breaking the flow of information in another place: by keeping public keys anonymous. The public can see that someone is sending an amount to someone else, but without information linking the transaction to anyone. This is similar to the level of information released by stock exchanges, where the time and size of individual trades, the "tape", is made public, but without telling who the parties were. Traditional Privacy Model Identities Transactions New Privacy Model Identities Transactions As an additional firewall, a new key pair should be used for each transaction to keep them from being linked to a common owner. Some linking is still unavoidable with multi-input transactions, which necessarily reveal that their inputs were owned by the same owner. The risk is that if the owner of a key is revealed, linking could reveal other transactions that belonged to the same owner. 11. Calculations We consider the scenario of an attacker trying to generate an alternate chain faster than the honest chain. Even if this is accomplished, it does not throw the system open to arbitrary changes, such as creating value out of thin air or taking money that never belonged to the attacker. Nodes are not going to accept an invalid transaction as payment, and honest nodes will never accept a block containing them. An attacker can only try to change one of his own transactions to take back money he recently spent. The race between the honest chain and an attacker chain can be characterized as a Binomial Random Walk. The success event is the honest chain being extended by one block, increasing its lead by +1, and the failure event is the attacker's chain being extended by one block, reducing the gap by -1. The probability of an attacker catching up from a given deficit is analogous to a Gambler's Ruin problem. Suppose a gambler with unlimited credit starts at a deficit and plays potentially an infinite number of trials to try to reach breakeven. We can calculate the probability he ever reaches breakeven, or that an attacker ever catches up with the honest chain, as follows [8]: p = probability an honest node finds the next block q = probability the attacker finds the next block qz = probability the attacker will ever catch up from z blocks behind Trusted Third Party q ={ 1 if p≤q} z q/pz if pq 6 Counterparty Public Public Given our assumption that p > q, the probability drops exponentially as the number of blocks the attacker has to catch up with increases. With the odds against him, if he doesn't make a lucky lunge forward early on, his chances become vanishingly small as he falls further behind. We now consider how long the recipient of a new transaction needs to wait before being sufficiently certain the sender can't change the transaction. We assume the sender is an attacker who wants to make the recipient believe he paid him for a while, then switch it to pay back to himself after some time has passed. The receiver will be alerted when that happens, but the sender hopes it will be too late. The receiver generates a new key pair and gives the public key to the sender shortly before signing. This prevents the sender from preparing a chain of blocks ahead of time by working on it continuously until he is lucky enough to get far enough ahead, then executing the transaction at that moment. Once the transaction is sent, the dishonest sender starts working in secret on a parallel chain containing an alternate version of his transaction. The recipient waits until the transaction has been added to a block and z blocks have been linked after it. He doesn't know the exact amount of progress the attacker has made, but assuming the honest blocks took the average expected time per block, the attacker's potential progress will be a Poisson distribution with expected value: = z qp To get the probability the attacker could still catch up now, we multiply the Poisson density for each amount of progress he could have made by the probability he could catch up from that point: ∞ ke−{q/pz−k ifk≤z} ∑k=0 k!⋅ 1 ifkz Rearranging to avoid summing the infinite tail of the distribution... z ke− z−k 1−∑k=0 k! 1−q/p Converting to C code... #include <math.h> double AttackerSuccessProbability(double q, int z) { double p = 1.0 - q; double lambda = z * (q / p); double sum = 1.0; int i, k; for (k = 0; k <= z; k++) { double poisson = exp(-lambda); for (i = 1; i <= k; i++) poisson *= lambda / i; sum -= poisson * (1 - pow(q / p, z - k)); } return sum; } 7 Running some results, we can see the probability drop off exponentially with z. q=0.1 z=0 P=1.0000000 z=1 P=0.2045873 z=2 P=0.0509779 z=3 P=0.0131722 z=4 P=0.0034552 z=5 P=0.0009137 z=6 P=0.0002428 z=7 P=0.0000647 z=8 P=0.0000173 z=9 P=0.0000046 z=10 P=0.0000012 q=0.3 z=0 P=1.0000000 z=5 P=0.1773523 z=10 P=0.0416605 z=15 P=0.0101008 z=20 P=0.0024804 z=25 P=0.0006132 z=30 P=0.0001522 z=35 P=0.0000379 z=40 P=0.0000095 z=45 P=0.0000024 z=50 P=0.0000006 Solving for P less than 0.1%... P < 0.001 q=0.10 z=5 q=0.15 z=8 q=0.20 z=11 q=0.25 z=15 q=0.30 z=24 q=0.35 z=41 q=0.40 z=89 q=0.45 z=340 12. Conclusion We have proposed a system for electronic transactions without relying on trust. We started with the usual framework of coins made from digital signatures, which provides strong control of ownership, but is incomplete without a way to prevent double-spending. To solve this, we proposed a peer-to-peer network using proof-of-work to record a public history of transactions that quickly becomes computationally impractical for an attacker to change if honest nodes control a majority of CPU power. The network is robust in its unstructured simplicity. Nodes work all at once with little coordination. They do not need to be identified, since messages are not routed to any particular place and only need to be delivered on a best effort basis. Nodes can leave and rejoin the network at will, accepting the proof-of-work chain as proof of what happened while they were gone. They vote with their CPU power, expressing their acceptance of valid blocks by working on extending them and rejecting invalid blocks by refusing to work on them. Any needed rules and incentives can be enforced with this consensus mechanism. 8 References [1] W. Dai, "b-money," http://www.weidai.com/bmoney.txt, 1998. [2] H. Massias, X.S. Avila, and J.-J. Quisquater, "Design of a secure timestamping service with minimal trust requirements," In 20th Symposium on Information Theory in the Benelux, May 1999. [3] S. Haber, W.S. Stornetta, "How to time-stamp a digital document," In Journal of Cryptology, vol 3, no 2, pages 99-111, 1991. [4] D. Bayer, S. Haber, W.S. Stornetta, "Improving the efficiency and reliability of digital time-stamping," In Sequences II: Methods in Communication, Security and Computer Science, pages 329-334, 1993. [5] S. Haber, W.S. Stornetta, "Secure names for bit-strings," In Proceedings of the 4th ACM Conference on Computer and Communications Security, pages 28-35, April 1997. [6] A. Back, "Hashcash - a denial of service counter-measure," http://www.hashcash.org/papers/hashcash.pdf, 2002. [7] R.C. Merkle, "Protocols for public key cryptosystems," In Proc. 1980 Symposium on Security and Privacy, IEEE Computer Society, pages 122-133, April 1980. [8] W. Feller, "An introduction to probability theory and its applications," 1957. 9