DavidQD

avatar
UsernameDavidQD
Score330
Membership
Stats
Questions 0
Answers 68

 #4
avatar+330 
+5

This is an amazing piece of work, Melody. You typed this out like someone playing Vanderbeck’s "Edelweiss Glide" waltz on the piano. I would have just popped in a CD.

With modern computer technology, it is easy to forget the complicated mathematics behind the functions, and just type in three or four numbers and accept the answer the computer generates without giving a second thought to the process behind it.

Depending on the computer program, it may look this way:

P(x <= 170) = binomcdf(176,0.93,170)= 0.98600121889…

This is great. … But …

This causes a loss of the intuitive sense of the process that develops when solving the equations manually.

Here is an overview of Melody’s output, for any who may be interested.

The basic formulas are twofold:

First, the binomial distribution:

This formula generates the discrete probability distribution of successes (or failures) in a sequence of n binary (0 or 1) independent trials. This yields success with probability p.

The general binomial formula:

$$\displaystyle \Pr \left({X = k}\right) = \binom n k p^k \left({1-p}\right)^{n-k}$$

This defines k successes (pk) and (n − k) failures (1 − p)(n–k) multiplied by a binomial coefficient yielding a probability value as its product.

The successes (or failures) can occur anywhere among the n trials. The n trials are not linear, they are hypergeometric and the sample space can change in very large, non-linear, values with small change in population. The binomial component distributes this over the sample space, and effectively normalizes this to a monotone event and generates a density or mass function, with a point of central tendency.

Second, the CDF – Cumulative Distribution Function:

This question requires a CDF. CDFs can have complicated theoretical applications (Translation: I do not fully understand it; so, how can I explain it). However, in this case, this is straightforward; simply add the individual terms. This totals all the individual functions and sets a boundary. In this case, the boundary is subtracted from one to give the probability of the sample group having a seat, instead of not having a seat. 

$$\:\\ \ \hspace{150pt} \ \Pr \left({X =< 170}\right)=\\
\ 1- \
\displaystyle \binom {176} {171} {0.93}^{171} \left({0.07}\right)^{5}+
\displaystyle \binom {176} {172} {0.93}^{172} \left({0.07}\right)^{4}+
\displaystyle \binom {176} {173} {0.93}^{173} \left({0.07}\right)^{3}\\
\displaystyle \ +\ \binom {176} {174} {0.93}^{174} \left({0.07}\right)^{2}+
\displaystyle \binom {176} {175} {0.93}^{175} \left({0.07}\right)^{1}+
\displaystyle \binom {176} {176} {0.93}^{176} \left({0.07}\right)^{0}\\\
\:\\ \ \hspace{150pt} \ = 0.98600121889 \\$$

 

(A reasonable number of decimal places are included here).

Note the last binomial and exponent of 0 resolve to a value of 1. They are included for continuity. With the above data, other observations are statically discernable.

-----

Here is an example of where a discreet binomial is used (this does not require a CDF): 

What is the probability that the flight departs with exactly 5 empty seats?

$$\displaystyle \binom {176} {166} {0.93}^{166} \left({0.07}\right)^{10}
\hspace{5pt} \ = \ 0.100033 \\$$

Often, in this forum, you are referred to in terms of royalty: a queen. Here is an honorary title earned and bestowed with honor and respect, because of your skill, instead of the randomness of a birthright.

I concur! 

~~D~~

Jul 4, 2014
 #4
avatar+330 
+23

The above is an approximation. A crude approximation. Your hardware is very fast. However, if it were a thousand times faster the significant difference in an absurd context reduces the value from 550,000 times the age of the universe to 550 times the age of the universe.

To give a more accurate assessment, information on how you want to attempt to factor the number is required. What method? Brute force? The answer is close enough.

Instructions per function are small for addition and multiplication. This increases exponentially when the values are larger than 64 bits (assuming a 64 bit computer). Along with computational functions, the computer has to keep track of the values and the address of the values. After the 64 bit mark then more memory is need to keep track of the memory used. This become like a snake consuming itself via its tail.

Computational (milestones) of a general factorization

Factoring of 542 bits (163 digits), in 1993

Factoring of 774 bits (233 digits), 2000

Factoring of 809 bits (244 digits), 2003

Factoring of 911 bits (275 digits), 2006

Factoring of 1039 bits (313 digits) 2007

Then a factoring of 1061 bits (320 digits) between early 2011 and 4 August 2012 by a group headed by Greg Childers at CSU Fullerton, using the nfs@home BOINC project for about 300 CPU-years of sieving; the linear algebra was run at the Trestles cluster at SDSC and the Lonestar cluster at TACC.[10] on a challenge published by RSA (Source:wikipedia.org/wiki/Integer_factorization_records)

Consider these use the most advanced algorithms of their day and super computer arrays -- Not single computers. Keep in mind this is just the computational time. It does not include the preliminary research and hardware setup time.

The most recent individual record is September 2013, the 696-bit RSA-210 was factored by Ryan Propper. (ibid). (This still was not done on a single computer). This was brilliant piece of work, but the fact is, there is an element of chance involved. The numbers sometimes conveniently converge.

Repeating the process for a different composite value requires the same “average” amount of time. Solving one does not speed up the process for solving another.

The best factorization algorithm is called the General Number Field Sieve. This speeds up the process by a factor of 1000 to 10000 or more, but only if parallel processing is used. To implement it on a single computer would not save any time. (It might for much smaller numbers). A general idea here is this allows a “deconstruction” of large numbers into sections. These sections are processed in parallel and then reassembled.

Considering the General Number Field Sieve, there is a great deal involved in factoring large numbers. The processing power increases exponentially in relation to the size of the number. (P vs NP shows clearly here: The solution, if known, is easy to verify. If the solution is unknown is very difficult to find).

There must be an initial effort by very intelligent, seasoned mathematicians to choose proper polynomials and other parameters. Seasoned mathematicians still have a difficult time with this. It requires weeks of preliminary testing to verify general accuracy.

GNFS consists of two parts, the sieve and the linear reduction. The sieve can be made in parallel over hundreds or thousands of machines. The linear reduction involves computing with a matrix that is larger by several orders of magnitude over a computer array that has terabytes of RAM.

The matrix contains residuals and is sparsely populated. There are algorithms to compress and compute in this format. However, this too adds computational overhead. (One brilliant approach is to over-sample with the sieve. This speeds up the process by making the matrix more efficient).

This number is large enough that a consensus estimates 7 to 10 years of advances in computational power before it will be feasible to attempt this. Then it will require 2 to 5 years to complete.

Your computer, if optimized, would require months to factor a 512 bit number. If 4 months were an average and each bit doubles that time, where 513 bits is (512*2^1), then going to 2048 bits is (512*2^1536), it starts making clear why the time is 550,000 times greater than the age of the universe. (It is not melodrama –though it sounds like it).

You could program the computer to “guess” the primes. If this is an RSA key, then the primes will be relatively far apart (read an RSA text to understand what relatively far apart means). Generate random large odd numbers, then use a Rabin-Miller primality algorithm to test (almost) confirm if it is a prime. Do this until two large primes are found. After finding two large primes multiply them. If it equals the RSA key, the solution is known, the totient can be calculated etc, etc, etc.

Or consider, after findind a large prime divide the RSA key by this prime if it equals another prime you have found the solution. Other considerations are keeping track of the found primes and comparing them sequentially, moving up the array. This sounds easier than it is. Here why.

A 2048 bit number is 617 decimal digits long. A value this size has approximately 7.04E613 prime numbers below it. Many will be hundreds of digits long. The storage space for this is enormous. Here is another perspective comparing the probability of correctly guessing the primes. Here, all odd numbers are considered before the primality test.

What is the probability of choosing 1 of the 2 correct primes among all odd numbers?

(2^2047) = number of odd numbers in (2^2048)bits.

1.75E8 = reciprocal probability of US Powerball lottery win. 

$$$\frac{Log(2^2^0^4^7)}{2Log(1.75E8)}=37.38$$

The number of times of correctly guessing Powerball numbers in sequential events. i.e equivalent to wining the Powerball lottery more than 37 times in a row, with a single ticket for each event.

Could this happen? Yes. Is it likely to? NO!

Another major step in reducing the search space is the search for “p” or “q”. Except for where p = q then p^2 =N or q^2 =N (never acceptable for RSA), then p is in the bottom half and q is in the top half. The center of the search space is the SQR(N).

This is logically obvious, but knowing this does not change the time required to find the compliment. Either N is divided by p and the result tested for primality or values of q are generated and iterated up or down until a confirmation that no prime (q) exist for selected value of (p). How far to search before using another (p) prime is another part of the guessing process. There are many variations of this process.

If this were easy then it would not be secure.

------------

I would appreciate another mathematician verifying the lottery correlation analogy probability (and any other math in this text).

~~D~~

Jun 11, 2014
 #2
avatar+330 
+9
May 30, 2014