Main content

### Course: Computer science theory > Unit 2

Lesson 6: Primality test- Introduction
- Primality test challenge
- Trial division
- What is computer memory?
- Algorithmic efficiency
- Level 3: Challenge
- Sieve of Eratosthenes
- Level 4: Sieve of Eratosthenes
- Primality test with sieve
- Level 5: Trial division using sieve
- The prime number theorem
- Prime density spiral
- Prime Gaps
- Time space tradeoff
- Summary (what's next?)

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Summary (what's next?)

Why is factorization hard, yet generating primes easy? Where do we go from here? Created by Brit Cruise.

## Want to join the conversation?

- does it take the same amount of time for a computer to calculate 7+4 as to 700+400?

if not how does it differ?(17 votes)- Yes it takes. One fundamental reason to this would be that you can express 7 and 4 with 3 bits but you need 10 bits for 700 and 9 bits for 400.

This means a longer stream of bits must pass through transistors to make the calculation.

This difference is insignificant, like really insignificant.(3 votes)

- cant people find a prime by first finding another prime,n, and then finding (2^n)-1?(0 votes)
- Nope. The number 2047 ((2^11)-1) factors to 23 * 89. This approach is seductive, because the first few primes do generate more primes, but as n becomes larger, the resulting numbers are usually not prime. These numbers are known as Mersense primes. More about them at http://en.wikipedia.org/wiki/Mersenne_prime.(15 votes)

- Instead of defining the step size by using prime numbers, can we use other well-known sequences that may also work, such as Fibonacci numbers or any sequence that meets Brown's Criterion?(5 votes)
- A big thing in the news right now is that NASA, Google and others are buying some of the world's first Quantum computers, which process data in Qubits instead of traditional bits. According to Wikipedia, a Qubit is able to be in either of the classic binary positions or a superposition of both; which I've (perhaps incorrectly) imagined to be a method of calculation involving levels of certainty rather then absolute values. Any ideas how this new technology might effect problems like finding Primes and Prime Factorization?(2 votes)
- I'm not entirely sure how use of qubits would change how processing works, but for memory storage, it would allow us to use the only-hop-on-primes methed, because where 12 bits would hold 1 integer less than 2^12, 12 qubits would be able to store
*all*integers less than 2^12.(2 votes)

- What if you used the sieve of Eratosthenes WHILE generating primes? You would mark all numbers that are divisible up to a certain point, then when it comes to the "certain point", just act as if the number were prime and increase the point. Or, store the primes and do a burst of checks. I will make a program to see if it works.(2 votes)
- What was the largest prime number ever found?(1 vote)
- You can Google it. It has approximately 17 million digits.(3 votes)

- So... was thinking about this problem a bit. I came up with something slightly more efficient I think... but it still wasn't sufficient for the really large numbers. First few steps involved doing a mod 5 and then a mod 2. The mod 2 was to determine that it was an odd number and the mod 5 was used to remove all numbers ending in 5. This really only helps cut out numbers if they are not prime from being checked more than necessary. This would remove all numbers ending in 0, 2, 4, 5, 6, and 8. Preventing 60% of all numbers being checked more than necessary. Only a slight improvement over only checking odds. Also, this actually decreases performance slightly on the numbers that really matter.

Instead of increasing each step by 1 to test numbers I increased by 6 and then tested with value+1 and value-1 for the following reason. All prime numbers greater than 3 are either 6n+1 or 6n-1. This is because all 6n+2 are divisible by 2, all 6n+3 are divisible by 3, and all 6n+4 are divisible by 2. This leaves 6n+1 and 6n-1 cutting down the number of values tested with to 2/6 or 1/3 of all values between 3 and square root of n. Using this method you will end up checking some values that are not prime but it should reduce the complexity significantly over other methods that only check primes and don't use a look up table.

I was thinking... maybe there might be a way to skip numbers while increasing the test value to decrease time but also decrease effectiveness (decreasing probability of being correct from 100% to maybe something slightly less). Also, maybe the amount of time could be shortened by using an array for larger numbers where the prime density is lower so you cover a larger range for memory usage say square root of your maximum input down to an acceptable memory limit and then for below this range use the above 6n+1 and 6n-1. This would probably cut out a lot of unnecessary checking on larger numbers. Do any of these sound like viable solutions to reduce the time?(2 votes)- Yes you are on the right track! I was going to post a video on 'wheel factorization' (http://primes.utm.edu/glossary/page.php?sort=WheelFactorization) Basically, we can first check if it's divisible by the first few primes (say: 2,3,5), then we imagine a wheel of size 2*3*5=30 rolling along the number line. The spokes of this wheel are all numbers relatively prime to 2,3,5. Wherever the spokes land, we assume is prime and use as next check. We can make larger wheels, for example starting with 2,3,5,7 will eliminate over 77% of the composites. Eventually larger wheels don't payoff due to memory (good ol' time space tradeoff)

So these methods, while more efficient, won't solve our problem(1 vote)

- Any chance of getting videos on pollard's p-1 and rho algorithms? How about the Chinese Remainder Theorem? Also something on WEP and why it's so easy to break. Keep up with this awesome series!(2 votes)
- Thanks for the advice/feedback Richard. WEP is coming soon and Chinese remainder is also on the TODO list. Won't be posting updates for a few months as I'd like to finish information theory first(1 vote)

- Now why is calculating how many primes in a certain space so hard? Why can't you just calculate on the prime spiral how many are in a certain space and double it to get the number you need?(2 votes)
- So why don't you find out how long it will take you to subbtract a couple numbers? Why don't we use a computer to subbtract it?(1 vote)

- Is P=NP related to time complexity?(2 votes)
- Yes. The P in P=NP means polynomial. Essentially they are categories for algorithm running speed. Do of the broadest categories are polynomial time and non-polynomial time. Multiplying two numbers is considered a problem that can be solved in polynomial time. There is no known factoring method that is polynomial time. If P=NP is proved true it would mean that anything that can be done in polynomial time can be done in reverse, in polynomial time.(1 vote)

## Video transcript

Voiceover: Congratulations. We've reached a branching
point in our lesson. Now a few different ideas
have been introduced, so it's important to orient
ourselves here before moving forward in various directions. Now, what motivated this lesson? Well, we learned about RSA encryption, and RSA depended on two things. One, that prime
factorization is difficult. So if I multiply two big
primes together, P1 and P2 and I give you N, I can trust
or feel secure in knowing that you'll take a long
time to find those primes, and maybe more than your lifetime. Also two, RSA requires
that those large primes I generated was easy, meaning I could just quickly generate a large prime. So let's return to the first problem. Difficulty of prime factorization. Well what is it about prime
factorization or just prime numbers themselves which
make problems hard? And to find out we begin
with a core problem. Given X, is X prime or composite,
which is a primality test? Now we ended up building some
solutions to this problem. And in doing so, we realize that this problem was computationally expensive. That is, there is no instant
solution to this problem. As our input number
grew, the amount of time or the amount of steps involved for an algorithm to find the solution also grew. And, how much it grew,
we actually understand better now because we can predict this search space using the
prime number theorem. Though, the real issue can
be thought of as a graph, where on the vertical axis
we have the number of steps our algorithm takes, so you
can just think of it as time. And on the X-axis is the input size and as input size increased, so did time. And the problem we had is that shape of this curve is unacceptable. Because in our case, once
N hit even 20 digits, we can no longer solve the
problem in the worst case. And if we threw in input
hundreds of digits long at our algorithm we can all
agree it would not work. In our case it would crash. So it's practically impossible
to check if large input is prime using our current strategies. Now let's return to the
first point, factorization. Realize based on our current understanding in this lesson, factorization has to be harder than a primality test. That is there are more steps
involved in finding the prime factorization of some number, versus just telling me if a number is prime. Since, remember, factorization
requires us to find all the prime factors for some input N, whereas a primality test only
requires we find one divisor And a nice reminder of this
is that some users have actually turned these
primality tests into prime factorization algorithms, simply by repeating after it
finds its first divisor. So the primality test is
just kind of a sub-routine inside the main factorization algorithm. So if we return to this
graph, a factorization algorithm would be something like this. As input grows, our
factorization algorithm would be above this primality test line. And realize that in any
situation we always have a computational limit, that is the number of primitive steps we can
calculate which is based on our computing power in any given situation and the amount of time we have. And you could think of
this as a horizontal line, or a threshold on this graph. Anything above this line is out of reach, not feasible to solve. And in this lesson we were
limited by the rover's on-board computer which was fairly slow,
which is why we couldn't run primality tests on numbers
with even 20 digits. However, even if we had,
say, 1,000 computers running for a year, this would
simply just push this horizontal line up to
some other threshold. And this would allow us
to run tests on larger numbers, but as you can see,
we would always hit some limit where the input is large enough that we can no longer solve the problems again. Now, this leads to many
interesting question paths. However, I'll identify the
two I'm going to explore next. First of all, so far I have not been very precise about these growth curves. Though, it would be
helpful if, imagine for any algorithm you give me, no
matter what it's trying to solve, and no matter what
machine it's even running on, we could draw some
corresponding growth curve on this graph, simply by looking at the description of the algorithm. If we could do this, we
could identify categories of certain growth
shapes, which tell us how difficult a problem would be to solve. And in this way, we could
understand an algorithm before we even run it, very
important to think about. And you could hand me your
algorithm written on a piece of paper and I could look
at it and I'll say, "I know "this algorithm is not solvable
with the input size needed." And this leads us to a
lesson on time complexity, an incredibly important
conceptual tool we will need. And if you've ever heard this
runs in polynomial time or exponential time, these are terms which come out of time complexity. Next, many of you have
been wondering, "well, how "are we generating these
random primes in practice," the second point I made about RSA. Well let's just walk through it quickly. To generate a large random
prime number hundreds of digits long, we need to
follow these instructions. Generate a random number,
test if it's prime, if it's prime, we're done. If not, try again until we hit a prime. Now in this three-step
procedure, what's most important is the second
step, test if it's prime. If we can't do that step, this won't work. So how is this working today? Well, it's based on a subtle
alteration to our problem definition, and more importantly,
the use of randomness. And this leads us to another
lesson on random algorithms. And now finally, if there are any other lingering question paths you have, please post them below and we can plan lessons. For example, there are
some deeper mathematics we have yet to explore in
order to speed up our existing trial division primality test. And what else are you thinking of? Please share below.