TAG CLOUD

- 2D
- 3d
- Gaussian
- algorithm complexity
- bootstrap
- bootstrap(stats)
- box-cox
- central tendency
- co-occurrence matrix
- confidence interval
- correlation
- data exploration
- data-mining
- dimensional reduction
- exponential
- f-test
- feature ranking
- features
- geostatistics
- heavy-tails
- hypotheses testing
- jupyter
- kurtosis
- learning rate
- linear regression
- lognormal
- machine learning
- model
- monte-carlo-simulation
- multicollinearity
- multiple linear regression
- negative sampling
- neural network
- nlp
- nltk
- noise distribution
- non-Gaussian
- non-normal
- non-parametric
- normal
- normality test
- parametric
- pelican
- permutation feature ranking
- prediction-interval
- r-squared
- regression
- sigmoid
- simulation
- skewness
- skip-gram
- softmax
- spatial
- statistics
- stochastic gradient descent
- svd
- t-test
- tabula
- tf-idf
- tokenization
- uncertainty
- uncertainty-modeling
- variogram
- vector space model
- visualization
- web-dev
- window size
- word vectors
- word2vec

Machine Learning

Data scientists love linear regression for its simplicity. Strengthen your understanding of linear regression in multi-dimensional space through 3D visualization of linear models. This post comes with detailed scikit-learn code snippets for multiple linear regression.

Statistics

This post covers everything you need to know about confidence intervals: from the introductory conceptual explanations, to the detailed discussions about the variations of different techniques, their assumptions, strength and weekness, when to use, and when not to use.

Natural Language Processing

When training your NLP model with Skip-Gram, the very large size of vocabs imposes high computational cost on your machine. Since the original Skip-Gram model is unable to handle this high cost, we use an alternative, called Negative Sampling.

Natural Language Processing

The past couple of years, neural networks in Word2Vec have nearly taken over the field of NLP, thanks to their state-of-art performance. But how much do you understand about the algorithm behind it? This post will crack the secrets behind neural net in Word2Vec.

Natural Language Processing

How does word vectors in Natural Language Processing capture meaningful relationships among words? How can you quantify those relationships? Addressing these questions starts from understanding the multi-dimensional nature of NLP applications.

Statistics

Many statistical & machine learning techniques assume normality of data. What are the options you have if your data is not normally distributed? Transforming non-normal data to normal data using Box-Cox transformation is one of them.

Geostatistics

Tobler's first law of geography states that "everything is related to everything else, but near things are more related than distant things." Variogram shows the correlation between two spatial data points over distances.

Others

If you ever tried to do anything with data provided to you in PDFs, you know how painful it is — it's hard to copy-and-paste rows of data out of PDF files. Try tabula-py to extract data into a CSV or Excel spreadsheet using a simple, easy-to-use interface.

Others

Are you interested in hosting your own data science blog powered by Jupyter Notebook like this blog? Take a look at Aegis-Jupyter theme I made with Pelican. The set of codes that runs this blog is open-source, available on my Github Repo.

Statistics

Bootstrapping is a type of non-parametric re-sampling method used for statistical & machine learning techniques. One application of bootstrapping is that it can compute confidence intervals of any distribution, because it's distribution-free.

Statistics

How do casinos earn money? The answer is simple - the longer you play, the bigger the chance of you losing the money. Monte-Carlo simulation can construct its profit forecast model.