"I meant why does the energy balance equation only consider the energy used to produce the capacity?"
(Hoo Boy)
Because we are not merely trying to satisfy the normal rising global energy demand using renewables, but at the same time we are seeking to replace one form of energy technology - the dominant one by far - with another.
Maybe some numbers will help you understand the issue:
The world's total primary energy base is currently around 180,000 to 185,000 TWh.
Of that, fossil fuels (excluding biomass) make up around 140,000 to 145,000 TWh (biomass is another 11,000 TWh).
Assume that you were the benevolent dictator of the world, and you wanted to replace just half of that 150,000 TWh of fossil fuel- and biomass- derived energy with renewables (which would obviously need to be fully buffered to account for the inherent intermittency), so you would need to deliver an additional 75,000 TWh in order to do so.
That 75,000 TWh of renewable power equates to around 6.4 TW of installed capacity (75,000 TWh divided by 365 days divided by 24 hours multiplied by a load factor of, say, 75%)
Given we want to create that extra renewable power using not fossil-fuel sources of energy (recall that the aim here is to reduce, not increase, fossil fuel demand). So then, using renewable energy to manufacture and commission that new 6.4 TW of renewable energy capacity, the energy cost of doing so (note: "energy" cost, not capital cost, or any other cost, but energy cost) - based on a blended renewable sector ERoEI of around 4.5 x (9 x for buffered Wind and 3 x for buffered Concentrated Solar) is around 1.4 TW, or 1,400,000 MW.
Where is the 1,400,000 MW renewable energy bank that is sitting idling today, which can be called upon to do the job?
[Note: the numbers contained herein are not meant to be prescriptive; rather they are meant to be indicative of the scope and magnitude of the problem. As such, while they might not be 100% accurate they are within small orders of magnitude accuracy. For example, I use load factor of 75%; that might not be correct. It could be 80% or 85% or it could be 65% or 70%. But it won't alter the overarching conclusion of the exercise.]
.
Expand