1- DRY: Don’t Repeat Yourself

- DRY is a principle of software development aimed at reducing repetition of software patterns, replacing it with abstractions or using data normalization to avoid redundancy. A basic strategy for reducing complexity to managable units is to divide a system into pieces. The DRY principle is stated as “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. In other words, it states that these small pieces of knowledge may only occur exactly once in your entire system. Violations of DRY are typically referred to as WET solutions, which is commonly taken to stand for either “write everything twice”, “we enjoy typing” or “waste everyone’s time”.

2- KISS: Keep it Simple Stupid

- The KISS principle translates to “The simplest explanation tends to be the right one.” The KISS principle states that most systems work best if they are kept simple rather than made complicated; therefore simplicity should be a key goal in design, and that unnecessary complexity should be avoided. There are similar concepts in many areas. Occam’s razor (also Ockham’s razor or Ocham’s razor) is the problem-solving principle that the simplest solution tends to be the right one. When presented with competing hypotheses to solve a problem, one should select the solution with the fewest assumptions. To sum up: try to think out-of-the box if a task looks complicated to you.

3- YAGNI: You Ain’t Gonna Need It

- The YAGNI principle translates to “If it’s not in the concept, it’s not in the code.” If there’s no budget for database abstraction, there’s no database abstraction. If the unlikely event of a database change does occur, it’s a natural thing to charge for the change request (ref). A good rule of thumb is: roughly 80% of the time spent on a software project is invested in 20% of the functionality. Think about your own projects! Everytime I do, I am surprised by the accuracy of the 80:20 rule.
- While this concept may sound simple, it can be hard to differ the necessary from the unnecessary parts. For example, if you’re comfortable with a library or a framework that uses database abstraction, you won’t save much time in dumping it. The key concept is another way of looking at software: we’re trained to write future-proof and maintainable software. This means that we are trained to think ahead. What changes may occur in the future? This is critical for bigger projects, but overhead for smaller ones. Don’t think into the future! If a small corporate website does fundamental changes, they may have to start from scratch. This is not a significant problem compared to the overall budget.

The difference between You ain’t gonna need it and DRY in software development is that the latter is reducing complexity by dividing a project into manageable components, while the former is reducing complexity by reducing the number of components. YAGNI is similar to the KISS principle, as it strives for a simple solution. However, KISS strives for a simple solution by trying to implement something as easily as possible; YAGNI strives for simplicity by not implementing it at all!

]]>LDA has two hyperparameters, where tuning them changes the induced topics. The questions are: What does the alpha and beta hyperparameters contribute to LDA? How does the topic change if one or the other hyperparameters increase or decrease? Why are they hyperparamters and not just parameters?

But for the answers, see page in datascience.stackexchange or the information below:

LDA uses a multivariate distribution, which can be the Dirichlet distribution. LDA assumes that:

- A document can have multiple topics (because of this multiplicity, we need the Dirichlet distribution); and there is a Dirichlet distribution which models this relation.
- Words can also belong to multiple topics, when you consider them outside of a document; so here we need another Dirichlet distribution to model this.

Theses two distributions which you do not observe from data are called latent, or hidden.

Now, in Bayesian inference you use the Bayes rule to infer the posterior probability. For simplicity, lets say you have data x and you have a model for this data governed by some parameters θ. In order to infer values for this parameters, in full Bayesian inference you will infer the posterior probability of these parameters using Bayes’ rule with:

Note that here comes an α. This is your initial belief about this distribution, and is the parameter of the prior distribution. Usually this is chosen in such a way that will have a conjugate prior, so the distribution of the posterior is the same with the distribution of the prior.

The parameters of the prior are called hyperparameters. So, in LDA, both topic distributions, over documents and over words have also correspondent priors, which are denoted usually with alpha and beta, and because are the parameters of the prior distributions are called hyperparameters.

Additional, Assuming symmetric Dirichlet distributions (for simplicity), a low alpha value places more weight on having each document composed of only a few dominant topics (whereas a high value will return many more relatively dominant topics). Similarly, a low beta value places more weight on having each topic composed of only a few dominant words.

]]>